ma4254 discrete optimization

MA4254: Discrete Optimization

Defeng Sun

Department of Mathematics

National University of Singapore

Office: S14-04-25

Telephone: 6516 3343

Aims/Objectives: Discrete optimization deals with problems of max-

imizing or minimizing a function over a feasible region of discrete struc-

ture. These problems come from many fields like operations research,

management science, and computer science. The primary objective

of this course is twofold: a) to study key techniques to separate easy

problems from difficult ones and b) to use typical methods to deal with

difficult problems.

Mode of Evaluation: Tutorial class performance (10%); Mid-Term

test (20%) and Final examination (70%)

This course is taught at Department of Mathematics, National Uni-versity of Singapore, Semester I, 2009/2010.E-mail: [email protected]

1

2References:

1) D. Bertsimas and J. N. Tsitsiklis, Introduction to Linear Optimiza-

tion. Athena Scientific, 1997.

2) G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial

Optimization. John Wiley and Sons, 1999.

3) C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization:

Algorithms and Complexity. Prentice-Hall, 1982. Second edition by

Dover, 1998.

PARTIAL lecture notes will be made available in my webpage

http://www.math.nus.edu.sg/ matsundf/

Discrete Optimization 3

1 Introduction

In this Chapter we will briefly discuss the problems we are going to study; give a

short review about simplex methods for solving linear programming problems and

introduce some basic concepts in graphs and digraphs.

1.1 Linear Programming (LP): a short re-view

Consider the following linear programming

(P )

min cTx

s.t. Ax bx 0

and its dual

(D)

max bTy

s.t. ATy cy 0 .

Simplex Method Dantzig (1947) Very efficient Not polynomial time algorithm. Klee and Minty (1972) gave an counterex-

ample.

Average analysis versus worst-case analysis Russians Ellipsoid Method Polynomial time algorithm (Khachiyan, 1979) Less efficient

Interior-Point Algorithms Karmarkar (1984) Polynomial times algorithm Efficient for some large-scale sparse LPs

Others

41.2 Discrete Optimization (DO)

Also Combinatorial Optimization (CO)

Mathematical formula in general:

min (x)

s.t. x F x decision policy F is the collection of feasible decision policies (x) measures the value of members of F .A typical DO (CO) problem:

(IP )

min cTx

s.t. Ax bx 0

xj integer for j I N := {1, , n}.

where c


2. The Assignment Problem

n people and m jobs, where n m Each job must be assigned to exactly one person, and each person can do at

most one job

The cost of person j doing job i is cij.Then the Assignment Problem can be formulated as

minmi=1

nj=1

cijxij

s.t.nj=1

xij = 1, i = 1, ,mmi=1

xij 1, j = 1, , n

x Bmn .

Extensions Three-Index Assignment Problem

3. Set-Covering, Set-Packing, and Set-Partitioning Problems

The Set-Covering Problem is

min cTx

s.t. Ax 1x Bn .

The Set-Packing Problem is

max cTx

s.t. Ax 1x Bn .

4. Traveling Salesman Problem (TSP)

6We are given a set of nodes V = {1, , n} and a set of arcs A. The nodes representcities, and the arcs represent ordered pairs of cities between which direct travel is

possible.

For (i, j) A, cij is the direct travel time from city i to city j.The TSP is to find a tour, starting at city 1, that

(a) visits each other city exactly once and then returns to city 1, and

(b) takes the least total travel time.

5. Facility Location Problem, Network Flow Problem, and many more

1.4 Why DO (CO) difficult

Arrangements grow exponentially is the superficial reason.

Total Unimodularity (TU) Theory; Shortest Path; Matroids and Greedy Algo-

rithm; Complexity (P 6= NP conjecture); Interior-Point Algorithms; Cutting Plane;Branch and Bound; Decomposition; Flowshop Scheduling, etc.

1.5 Convex sets

In linear programming and nonlinear programming, we have already met many con-

vex sets. For examples, the line segment between two points in


1.6 Hyperplanes and half spaces

Definition 1.2 Let a be a nonzero vector in

8Let x0 be any point on the hyperplane {x


It is noted that these halfspaces are finite in number. The intersection of two poly-

hedrons is again a polyhedron. So {x

10

over which we are optimizing. There are quite a number of different but equivalent

ways to define the concept of a corner. Here we introduce two of them exreme

points and basic feasible solutions.

Our first definition defines an extreme point of a polyhedron as a point that can

not be expressed as a convex combination of two other points of the polyhedron.

Definition 1.6 Let P


where a1 = (0, 0, 2)T , a2 = (4, 0, 0)

T and a3 = (1, 1, 1)T . Let a4 = e1, a5 = e2 and

a6 = e3. Then

M1 = {1, 4, 5, 6}, M2 = {2}, M3 = {3}.

Definition 1.7 If a vector x satisfies aTi x = bi for some i M1,M2 or M3, we

say that the corresponding constraint is active or binding at x. The active set

of P at x is defined as

I(x) = {i M1 M2 M3 | aTi x = bi},

i.e., I(x) is the set of indices of constraints that are active at x.

For example, suppose that P is defined by (1.1). Let x = (0.5, 0, 0.5)T . All

active constraints at x are

aT1 x 1, aT3 x = 1, aT5 x(= x2) 0

and

I(x) = {1, 3, 5}.

Recall that vectors x1, . . . , xk

12

(a) There exist n vectors in the set {ai | i I(x)}, which are linearly independent.

(b) The span of the vectors ai, i I(x), is all of


which is orthogonal to the subspace spanned by these vectors. If x satisfies aTi x = bi

for all i I(x), we also have aTi (x+d) = bi for all i I(x), thus obtaining multiplesolutions. We have therefore established that (b) and (c) are equivalent. Q.E.D.

With a slight abuse of language, we will often say that certain constraints are

linearly independent, meaning that the corresponding vectors ai are linearly inde-

pendent. We are now ready to provide an algebraic definition of a corner point of

the polyhedron P .

Definition 1.8 Let x

14

Note that if the number m of constraints used to define a polyhedron P


The set P = {x

16

Definition 1.11

(a) A nonzero element d of a polyhedral cone C


1.10 Simplex Method Revisited

Consider the standard linear programming problem

(P )

min cTx

s.t. Ax = b,

x 0,(1.2)

where A

18

Finding an initial basic feasible solution: The artificial variables method and the

bigM method.

For the dual simplex method, we have

0 c1 . . . cn

b1 | |... A1 . . . An

bm | |

and

cTBxB c1 . . . cnxB(1) | |... B1A1 . . . B1An

xB(m) | |We do not require B1b to be nonnegative, which means that we have a basic,

but not necessarily feasible solution to the primal problem. However, we assume

that c 0; equivalently, the vector yT = cTBB1 satisfies yTA cT , and we havea feasible solution to the dual problem. The cost of this dual feasible solution is

yT b = cTBB1b = cTBxB, which is the negative of the entry at the upper left corner of

the tableau.


1.11 Graphs and Digraphs

1.11.1 Graphs

Definition 1.12 A graph G is a pair (V,E), where V is a finite set and E is a set

of unordered pairs of elements of V . Elements of V are called vertices and elements

of E edges. We say that a pair of distinct vertices are adjacent if they define an

edge, and that the edge is said to be incident to its defining vertices. The degree of a

vertex v (denoted deg(v)) is the number f edges incident to that vertex.

An Example.

e1

e3 e2e4

Figure 1.3: A Graph

Definition 1.13 An v1vk-path (or path connecting v1 and vk) is a sequence of edges

v1v2, . . . , vi1vi, . . . , vk1vk.

A cycle is a sequence of edges

v1v2, . . . , vi1vi, . . . , vk1vk, vkv1.

In both cases vertices are all distinct. A graph is acyclic if it has no cycle.

Proposition 1.2 If every vertex of G has degree of at least two then G has a cycle.

Proof. Let P = v1v2, . . . , vk1vk be a path of G with a maximum number of edges.

Since deg(vk) 2, there is an edge vkw where w 6= vk1. It follows from the choiceof P that w is a vertex of P , i.e., w = vi for some i {1, . . . , k 2}. Thenvivi+1, . . . , vk1vk, vkvi is a cycle. Q.E.D.

20

Definition 1.14 G is connected if each pair of vertices is connected by a path.

Proposition 1.3 Let G be a connected graph with a cycle C and let e be an edge of

C. Then G e is connected.

Proof. Let v1, v2 be vertices of G e. We need to show there exists a v1v2-path P of G e. Since G is connected there exists a v1v2-path P of G. If P does not use ethen we are done. Otherwise P implies there exists a v1w1-path P1 and a w2v2-path

P2, where w1, w2 are endpoints of e. Moreover, C w1w2 is a w1w2-path. The resultnow follows. Q.E.D.

Definition 1.15 H is a subgraph of G if V (H) V (G) and E(H) E(G). It is aspanning subgraph if in addition V (H) = V (G).

Definition 1.16 A tree is a connected acyclic graph.

Theorem 1.2 If T = (V,E) is a tree, then |E| = |V | 1.

Proof. Let us proceed by induction of the number of vertices of V . The base

case |V | = 1 is trivial since then |E| = 0. Assume now |V | 2 and suppose thetheorem holds for all trees with |V | 1 vertices. Since T is acyclic, it follows formProposition 1.2 that there is a vertex v with deg(v) 1. Since T is connected and|V | 2, deg(v) 6= 0. Thus, there is a unique uv incident to v. Let T be defined asfollows V (T ) = V {v} and E(T ) = E {uv}. Observe that T is a tree. Henceby induction |E(T )| = |V (T )| 1 and it follows |E| = |V | 1. Q.E.D.

Proposition 1.4 Let G = (V,E) be a connected graph. Then |E| |V | 1. More-over, if equality holds then G is a tree.

Proof. If G has a cycle then remove from G any edge on the cycle. Repeat until

the resulting graph T is acyclic. It follows from Proposition 1.3 that T is connected.

Hence T is a tree and by Theorem 1.2,

|E(G)| |E(T )| = |V (G)| 1.

Q.E.D.


1.11.2 Bipartite Graph

G = (S, T,E): For any edge in E with one vertex in S and the other in T .

1.11.3 Vertex-Edge Incidence Matrix

Definition 1.17 The vertex-edge incidence matrix of a graph G = (V,E) is a

matrix A with |V | rows and |E| columns whose entries are either 0 or 1 such that The rows correspond to the vertices of G, The columns correspond to the edges of G, and the entry Av,ij for vertex v andedge ij is given by

Av,ij =

0 if v 6= i and v 6= j1 if v = i or j.1.11.4 Digraphs (Directed Graphs)

Definition 1.18 A directed graph (or digraph) D is a pair (N,A) where N is a finite

set and A is a set of ordered pairs of elements of N . Elements of N are called nodes

and elements of A arcs. Node i is the tail (resp. head) of arc ij. The in-degree (resp.

out-degree) of node v (denoted deg+(v) (resp. deg(v)) is the number of arcs with

head (resp. tail) v.

1.11.5 Bipartite Digraph

D = (S, T,A)

1.11.6 Node-Arc Incidence Matrix

Definition 1.19 The node-arc incidence matrix of a graph D = (N,A) is a

matrix M with |V | rows and |A| columns whose entries are either 0, +1, or 1 suchthat

The rows correspond to the nodes of D, The columns correspond to the arcs of D, and the entry Mv,ij for node v and arc

22

ij is given by

Mv,ij =

0 if v 6= i and v 6= j+1 ifv = j, and

1 if v = i.


2 Total Unimodularity (TU) and Its Applications

In this section we will discuss the total unimodularity theory and its applications to

flows in networks.

2.1 Total Unimodularity: Definition and Properties

Consider the following integer linear programming problem

(P )

max cTx

s.t. Ax = b

x 0(2.1)

where A Zmn, b Zm and C Zn all integers.

Definition 2.1 A square, integer matrix B is called unimodular if |Det(B)| = 1.An integer matrix A is called totally unimodular if every square, nonsingular

submatrix of A is unimodular.

The above definition means that a TU matrix is a {1, 0,1}-matrix. But, a{1, 0,1}-matrix may not necessarily a TU matrix, e.g.,

A =

1 11 1

Lemma 2.1 Suppose that A Znn is a unimodular matrix and that b Zn isan integer vector. If A is nonsingular, then Ax = b has the unique integer solution

x = A1b.

Proof. Let aij be the ij-th entry of A, i, j = 1, . . . , n. For any aij, define the cofactor

of aij as

Cof(aij) = (1)i+jDet(A{1,...,n}\{j}{1,...,n}\{i} ),

where (A{1,...,n}\{j}{1,...,n}\{i} ) is the matrix obtained by removing the i-th row and the j-th

column of A. Then

Det(A) =ni=1

ai1 Cof(ai1).

24

The Adjoint of A is

Adj(A) = Adj({aij}) = {Cof(aij)}T

and the inverse of A is

A1 =1

Det(A)Adj(A).

Since A Znn is a unimodular nonsingular integer matrix, every Cof(aij) is aninteger and Det(A) = 1. Hence A1 is an integer matrix and x = A1b is integerwhenever b is. Q.E.D.

Theorem 2.1 If A is TU, every basic solution to P is integer.

Proof. Suppose that x is a basic solution to P . Let N be the set of indices of x such

that xj = 0. Since x is a basic solution to P , there exist two nonnegative integers p

and q with p+ q = n and indices B(1), . . . , B(p) {1, . . . ,m} and N(1), . . . , N(q) N such that

{ATB(i)}pi=1 {eTN(j)}qj=1are linearly independent, where eN(j) is the N(j)-th unit vector in


Proposition 2.3 A Zmn is TU = (A I) is TU, where I

26

Obviously, |Det(B)| = |Det(B)| and

|Det(B)| = |Det(A1)||Det(I )| = |Det(A1)|.

Now A is totally unimodular implies |Det(A1)| = 0 or 1 and since B is assumed tobe nonsingular, |Det(B)| = 1. Again, from Lemma 2.1, yB is an integer. Hence y isinteger because yj = 0, j / B. This implies that x is integer. [One may also makeuse of Theorem 2.1 and Proposition 2.3 to get the proof immediately.]

(2 3).Let B Zpp be any square nonsingular submatrix of A. It is sufficient to prove

that bj is an integer vector, where bj is the jth column of B1, j = 1, . . . , p.

Let t be an integer vector such that t + bj > 0 and bB(t) = Bt + ej, where ej is

the jth unit vector. Then

xB = B1bB(t) = B1(Bt+ ej) = t+B1ej = t+ bj > 0.

By choosing bN (N = {1, . . . , n}\B) sufficiently large such that (Ax)j < bj,j N , where xj = 0, j N . Hence x is an extreme point of S(b(t)). As xB and tare integer vectors, bj is an integer vector too for j = 1, . . . , p and B

1 is an integer.

(3 1).Let B be an arbitrary square, nonsingular submatrix of A. Then

1 = |Det(I)| = |Det(BB1)| = |Det(B)||Det(B1)|.

By the assumption, B and B1 are integer matrices. Thus

|Det(B)| = |Det(B1)| = 1,

and A is TU. Q.E.D.


Theorem 2.3 (A sufficient condition of TU) An integer matrix A with all aij = 0, 1,

or 1 is TU if

1. no more than two nonzero elements appear in each column,

2. the rows of A can be partitioned into two subsets M1 and M2 such that

(a) if a column contains two nonzero elements with the same sign, one element

is in each of the subsets,

(b) if a column contains two nonzero elements of opposite signs, both elements

are in the same subset.

Proof. The proof is by induction. One element submatrix of A has a determinant

equal to (0, 1,1).Assume that the theorem is true for all submatrices of A of order k 1 or less.

If B contains a column with only one nonzero element, we expand Det(B) by that

column and apply the induction hypothesis.

Finally, consider the case in which every column of B contains two nonzero ele-

ments. Then from 2(a) and 2(b) for every column jiM1

bij =iM2

bij, j = 1, . . . , k.

Let bi be the ith row. Then the above equality givesiM1

bi iM2

bi = 0,

which implies that {bi}, i M1 M2 are linearly dependent and thus B is singular,i.e., Det(B) = 0. Q.E.D.

Corollary 2.1 The vertex-edge incidence matrix of a bipartite graph is TU.

Corollary 2.2 The node-arc incidence matrix of a digraph is TU.

28

2.2 Applications

In this section we show that the assumptions in Theorems in Section 2.1 for integer

programming problems connected with optimization of flows in networks are fulfilled.

This means that these problems can be solved by the SIMPLEX METHOD.

However, it is not necessarily to use the simplex method because more efficient

methods have been developed by taking into consideration the specific structure of

these problems.

Many commodities, such as gas, oil, etc., are transported through networks in which

we distinguish sources, intermediate transportation or distribution points and desti-

nation points.

We will represent a network as a directed graph G = (V,E) and associate with

each arc (i, j) E the flow xij of the commodity and the capacity dij (possiblyinfinite) that bounds the flow through the arc. The set V is partitioned into three

sets:

V1 set of sources or origins, V2 set of intermediate points, V3 set of destinations or sinks.

V231V

V

Figure 2.1: A network


For each i V1, let ai be a supply of the commodity and for each i V3, let bi be ademand for the commodity.

We assume that there is no loss of the flow at intermediate points. Additionally,

denote V (i) (V (i)) as

V (i) = {j| (i, j) E} and V (i) = {j| (j, i) E},

respectively.

Then the minimum cost capacitated problem may be formulated as

(P) v(P ) = min

(i,j)Ecijxij

subject to

jV (i)

xij

jV (i)xji

ai, i V1,= 0, i V2, bi, i V3,

(2.2)

0 xij dij, (i, j) E. (2.3)

Constraint (2.2) requires the conservation of flow at intermediate points, a net flow

into sinks at least as great as demanded, and a net flow out of sources equal or less

than the supply. In some applications, demand must be satisfied exactly and all of

the supply must be used. If all of the constraints of (2.2) are equalities, the problem

has no feasible solutions unless

iV1

ai =iV3

bi.

To avoid pathological cases, we assume for each cycle in the network G = (V,E)

either that the sum of costs of arcs in the cycle is positive or that the minimal

capacity of an arc in the cycle is bounded.

Theorem 2.4 The constraint matrix corresponding to (2.2) and (2.3) is totally uni-

modular.

30

Proof. The constraint matrix has the form

A =

A1I

,where A1 is the matrix for (2.2) and I is an identity matrix for (2.3). In the last

section, we show that A1 is totally unimodular implies that A is totally unimodular.

Each variable xij appears in exactly two constraints of (2.2) with coefficients +1

or 1. Thus A1 is an incidence matrix for a digraph and therefore it is totallyunimodular. Q.E.D.

The most popular case of P is the so-called (capacitated) transportation prob-

lem. We obtain it if we put in P : V2 = , V (i) = for all i V1 and V (i) = forall i V3.So we get

(TP)

v(T ) = min

(i,j)Ecijxij,

s.t.jV (i)

xij ai, i V1,

jV (i)

xji bi, i V3,

0 xij dij, (i, j) E.

If dij = for all (i, j) E, the uncapacitated version of P is sometimes calledthe transshipment problem.


If all ai = 1, and all bi = 1, and additionally, |V1| = |V3|, the transshipmentproblem reduces to the so-called assignment problem of the form

(AP)

v(AP ) = miniV1

jV (i)

cijxij,

s.t.jV (i)

xij = 1, i V1,

jV (i)

xji = 1, i V3,

xij 0.

Note that |V1| = |V3| implies that all constraints in (AP) must be satisfied as equal-ities.

Let V = {1, . . . ,m}. Still another important practical problem obtained fromP is called the maximum flow problem. In this problem, V1 = {1}, V3 = {m},V (1) = , V (m) = , a1 =, bm =.

The problem is to maximize the total flow into the vertex m under the capacity

constraints

(MF)

v(MF ) = max

iV (m)xim,

s.t.jV (i)

xij

jV (i)xji = 0,

i V2 = {2, . . . ,m 1},

0 xij dij, (i, j) E.

Finally, consider the shortest path problem. Let cij be interpreted as the

length of edge (i, j). Define the length of a path in G to be the sum of the edge

lengths over all edges in the path. The objective is to find a path of minimum length

32

from a vertex 1 to vertex m. It is assumed that all cycles have nonnegative length.

This problem is a special case of the transshipment problem in which V1 = {1},V3 = {m}, a1 = 1 and bm = 1.

Let A be the incidence matrix of the digraph G = (V,E), where V = {1, . . . ,m}and E = {e1, . . . , en}. With each arc ej we associate its length cj 0 and its flowxj 0. The shortest path problem may be formulated as:

(SP)

v(SP ) = minnj=1

cjxj,

s.t. Ax =

10...

0

+1

, x 0.

The first constraint corresponds to the source vertex, the mth constraint corresponds

to the demand vertex, while the remaining constraints correspond to the intermediate

vertices, i.e., the points of distribution of the unit flow.

The dual problem to SP is

(DSP) v(DSP ) = max(u1 + um),

ATu c. (2.4)


3 The Shortest Path

3.1 The Primal-Dual Method

Consider the standard linear programming

(P )

min cTx

s.t. Ax = b 0x 0

and its dual

(D)max piT b

s.t. piTA cT .

Suppose that we have a current pi which is feasible to the dual problem (D). Define

the index set J by

J = {j : piTAj = cj} ,

where Aj is the jth column of A. Then for any j / J , we have piTAj < cj. Wecall J the set of admissible columns. In order to search for an x such that it is

not only feasible to the primal problem (P) but also it, togther with pi, satisfies the

complementary condition of (P) and (D), we invent a new LP, called the restricted

primal (RP), as follows

(RP )

= minmi=1

xai

s.t. Ax+ xa = b

xj 0 , for all j ,

xj = 0 , j / J ,

xai 0 , i = 1, . . . ,m ,

34

i.e.,

(RP )

= min 0TxJ +mi=1

xai

s.t. AJxJ + xa = b

xJ 0, xa 0 .

The dual of (RP) is

(DRP )

w = max piT b

s.t. piTAj 0, j J

pii 1, i = 1, . . . ,m .

Let (xJ , xa) be an optimal basic feasible solution to (RP) and pi be an optimal basic

feasible solution to (DRP) obtained from (xJ , xa). If w = 0, then = 0. Such an

x is found. Otherwise, w > 0 and we can update pi to

pinew = pi + pi .

The new cost to (D) is

(pinew)T b = piT b+ piT b = piT b+ w,

which means that we shall get a better pi if we can take > 0. On the other hand,

pinew should be feasible to (D), i.e.,

(pinew)TAj = piTAj + pi

TAj cj .

Since for every j J , piTAj 0, we only need to consider those piTAj > 0, j / J .Therefore, we can take


= mincj piTAjpiTAj

.

j / J

such that

piTAj > 0

Primal P Dual P

(DRP)

Restricted Primal (RP)

Dual of RP

pi

pi

Adjustment to pi

Figure 3.1: An illustration of the prima-dual method

3.2 The Primal-Dual Method for the Shortest Path Problem

Let A be the incidence matrix of the digraph G = (V,E), where V = {1, . . . ,m} andE = {e1, . . . , en}. With each arc ej we associate its length cj 0 and its flow xj 0.The shortest path problem, as we have already known, may be formulated as:

minnj=1

cjxj,

s.t. Ax =

10...

0

+1

,

x 0 .

(3.1)

Let A be the remaining submatrix of A by removing the last row of A (it is redundant

because the sum of all rows of A is zero). Then (3.1) turns into

36

minnj=1

cjxj,

s.t. Ax =

10...

0

,x 0 .

(3.2)

The dual problem to (3.2) is

max pi1s.t. pii + pij cij for all (i, j) E,

pim = 0 ,

(3.3)

where we must fix pim = 0 because the last row of A is omitted in A.

The idea of primal-dual algorithm is derived from the idea of searching for a

feasible point x such that

xij = 0 (some xk) whenever pii + pij < cij ,

for given feasible pi (Remark: think about complementary conditions). We search

for such an x by solving an auxiliary problem, called the restricted primal (RP),

determined by the pi we are working with. If our search for the x is not successful,

we nevertheless obtain information from the dual of RP, which we call DRP, and

tells us how to improve the particular pi with which we started.


Next, we give the details. The shortest-path problem can be written as

minnj=1

cjxj,

s.t. Ax =

+1

0...

0

,x 0 ,

(3.4)

where A = A. The purpose of introducing A is to make the right hand side of theconstraint Ax = b nonnegative. Now, the dual problem of (3.4) is

max pi1

s.t. pii pij cij for all (i, j) E,pim = 0 .

(3.5)

For a given feasible pi to (3.5), the set of admissible arcs is defined by

J = {arcs (i, j) : pii pij = cij} .

The corresponding restricted primal problem (RP) is

= minm1i=1

xai ,

s.t. Ax+ xa =

+1

0...

0

,xj 0 , for all j ,

xj = 0 , j / J ,

xai 0 , i = 1, . . . ,m 1

(3.6)

38

and the dual of the restricted primal (DRP) is

w = max pi1

s.t. pii pij 0 for all (i, j) J ,

pii 1 for all i = 1, . . . ,m 1 ,

pim = 0 .

(3.7)

DRP (3.7) is evry easy to solve:

Since pi1 1 and we wish to maximize pi1, we try pi1 = 1. If there is no pathfrom pi1 to pim (node 1 to node m), using only arcs in J , then we can propagate the

1 from node 1 to all nodes reachable by a path from node 1 without violating the

pii pij 0 constraints, and an optimal solution to the DRP is then

pi =

1 for all nodes reachable by paths

from node 1 using arcs in J

0 for all nodes from which node m

is reachable using arcs in J

1 for all other nodes.

(Notice that this pi is not unique.)

We can then calculate

1 = min {cij (pii pij)}

arcs (i, j) / J

such that

pii pij > 0

to update pi and J , and re-solve the DRP.


1 J

J

J

J

J

m

0

0

1

1 1

1

01

Figure 3.2: A solution to the restricted dual problem

pi : = pi + 1pi .

If we get to a point where there is a path from node 1 to node m using arcs in J ,

pi1 = 0, and we find an optimal solution because = w = 0. Any path from node

1 to node m using only arcs in J is optimal.

The primal-dual algorithm reduces the shortest path problem to repeated solution

of the simpler problem of finding the set of nodes reachable from a given node.

Interpretation: Define at any point in the algorithm the set

W = {i : node m is reachable from i

by admissible arcs}

= {i : pii = 0} .

Then the variable pii remains fixed from the time that i enters W to the conclusion

of the algorithm, because the corresponding pii will always be zero.

Every arc that becomes admissible (enter J) stays admissable throughout the

40

algorithm, because once we have

pii pij = cij for (i, j) E ,

we always change pii and pij by the same amount.

pii, i W is the length of the shortest path from node i to node m and thealgorithm proceeds by adding to W , at each stage, the nodes not in W next closest

to node m.

At most |v| = m stages.

Dijkstras algorithm is an efficient implementation of the primal-dual algorithm

for the shortest path problem.

3.3 Bellmans Equation

Let cij be the length of arc (i, j) (positive arcs if cij > 0; nonnegative if cij 0).

Let uij be the length of the shortest path from i j. Define

ui = u1i.

Then Bellmans Equations are u1 = 0,ui = mink 6=i

{uk + cki}.

3.4 Dijkstras Algorithm

In this section we assume that cij 0. Denote

P : permanently labeled nodes;

T : temporarily labeled nodes.


1

i

kk

i

ki

u

c

u

Figure 3.3: Bellmans equation

P and T always satisfy

P T = & P T = V.

Label for node j, [uj, lj] where uj : the length of the (may be temporary) shortest

path from node 1 to j and lj : the preceding node in the path.

Dijkstras algorithm can be summarized as follows.

Step 0. P = {1}, u1 = 0, l1 = 0, T = V \P. Compute

uj =

c1j if (1, j) E, if (1, j) / E,lj =

1 if (1, j) E,0 if (1, j) / E.Step 1. Find k T such that

uk = minjT

{uj}.

Let P = P {k} and T = T\{k}. If k = n, stop.

42

Step 2. For j T , if uk + ckj < uj, let [uj = uk + ckj, lj = k] and go back toStep 1.

Claim: At any step, uj is the length of the shortest path from 1 to j, only passing

nodes in P .

[Suppose not and j is the first violation... ].

Claim: The total cost is O(n2).

3.5 PERT or CPM Network

A large project is devisable into many unit tasks. Each task requires a certain

amount of time for its completion, and the tasks are partially ordered.

This network is sometimes called a PERT (Project Evaluation and Review Tech-

nique) or CPM (Critical Path Method) network. A PERT network is necessarily

acyclic.

Theorem 3.1 A digraph is acyclic if and only if its nodes can be renumbered in such

a way that for all arc (i, j), i < j. [The work of this is O(n2)]

Claim: For any acyclic graph, at least one node has indegree 0. After renumbering

it, we have for all (i, j), i < j.

Bellmans equations are u1 = 0,ui = mink 6=i

{uk + cki}


For acyclic graphs, they turn out to be u1 = 0,ui = mink

44

3.7 Floyd-Warshall Method for Shortest Paths Between All Pairs

Again, we need the assumption that the networks contain no negative cycles in order

that the Floyd-Warshall method works.

Step 0. u(1)ij = cij, i, j = 1, . . . , n.

Step k. For k = 1, . . . , n,

u(k+1)ij = min{u(k)ij , u(k)ik + u(k)kj }, i, j = 1, . . . , n

Claim: u(k)ij is the length of a shortest path from i to j, subject to the condition

that the path does no pass through k, k + 1, . . . n (i and j excepted). [This means

u(n+1)ij = uij].

Proof by induction. It is clearly true for Step 0. Suppose it is true for u(k)ij for

all i and j. Now consider u(k+1)ij . If a shortest path from node i to node j which does

not pass through nodes k+1, k+2, . . . n does not pass through k, then u(k+1)ij = u

(k)ij .

Otherwise, if it does pass through node k, u(k+1)ij = u

(k)ik + u

(k)kj .

It is easy to see that the complexity of the Floyd-Warshall method is O(n3).

The Floyd-Warshall requires the storage of an n n matrix. Initially this isU (1) = C. Thereafter, U (k+1) is obtained from U (k) by using row k and column k

to revise the remaining elements. That is, uij is compared with uik + ukj and if the

later is smaller, uik + ukj is substituted for uij in the matrix.

There are other methods of the above type, e.g. G B Dantzig method.

3.8 Other Cases

1. Sparse graphs

|A|


not allow repetitive arcs not allow repetitive nodes3. with time constraints

4. with fixed charge

ZusrlarafZ (z4zdz 'z) =Q :G

46

4 The Greedy Algorithm and Com-putational Complexity

4.1 Matroid

1935, matroid theory founded by H. Whitney; 1965, J. Edmonds pointed out the significance of matroid theory

to combinatorial optimization (CO).

Importance: 1) Many CO problems can be formulated as matroid

problems, and solved by the same algorithm;

2) We can detect the insight of the CO problems;

3) A special tool for CO.

Definition 4.1 Suppose we have a finite ground set S, |S| < , anda collection, , of subsets of S. Then H := (S,) is said to be an

independent system if the empty set is in and is closed under

inclusion; that is

i) ;

ii) X Y = X .

Elements in are called independent sets, and subsets of S not in

are called dependent sets.


Example: Matching system. G = (V,E),

= {all matchings in G}.

[A matchingM of a graph G = (V,E) is a subset of the edges with the

property that no two edges of M share the same node. A matching M

is a piecewise disjoint edge set]

e1

e3 e2e4

Figure 4.1: A Matching Example

In Figure 4.1,

S = {e1, e2, e3, e4}, = {, {e1}, {e2}, {e3}, {e4}, {e2, e3}}.

48

Definition 4.2 If H = (S,) is an independent system such that

X, Y , |X| = |Y |+ 1 =

there exists e X\Y such that Y + e ,

then H (or the pair (S,)) is called a matroid.

Examples: i) Matric matroid: A matrix A = (a1, . . . , an)mn, S =

{a1, . . . , an},

X X = {ai1, . . . , aik} is independent.

ii) Graphic matroid: G = (V,E), S = E,

X X E, X has no cycle.

ii) is a special case of i) with A = the vertex-edge incidence matrix.

4.2 The Greedy Algorithm

Suppose that H = (S,) is an independent system and W : S


Greedy Algorithm:

Suppose W (e1) W (e2) . . . W (en).Step 0. Let X = .Step k. If X + ek , let X := X + ek, where k = 1, . . . , n.

Theorem 4.1 (Rado, Edmonds) The above algorithm works if and

only if H is a matroid.

Applications:

1) The Maximal Spanning Tree Problem.

Suppose that there is a television network leasing video links so that

its stations in various places can be formed into a connected network.

Each link (i, j) has a different rental cost cij. The question is how the

network can be constructed to have the minimum cost? Obviously,

what is wanted is a minimum cost spanning tree of video links. Re-

placing cij by M cij, where M is a larger number, we can see that itthen turns into a maximum spanning tree (MST). Kruskal has already

proposed the following solution: Choose the edges one at a time in

order of their weights, largest first, rejecting an arc only if it forms a

cycle with edges already chosen.

2) A Sequencing Problem.

Suppose that there are a number of jobs which are to be processed

50

by a single machine. All jobs require the same processing time. Each

job j has assigned to it a deadline dj, and a penalty pj, which must be

paid if the job is not completed by its deadline. What ordering of the

jobs minimizes the total penalty costs? It can be easily seen that there

exists an optimal sequence in which all jobs completed on time appear

at the beginning of the sequence in order of deadlines, earliest deadline

first. The late jobs follow, in arbitrary order. Thus, the problem is to

choose an optimal set of jobs which can be completed on time. The

following procedure can be shown to accomplish that objective.

Choose the jobs one at a time in order of penalties, largest first,

rejecting a job only if its choice would mean that it, or one of the jobs

already chosen, cannot be completed on time. [This requires checking to

see that the total amount of processing to be completed by a particular

deadline does not exceed the deadline in question.]

For example, consider the set of jobs below, where the processing

time of each job is one hour, and the deadlines are expressed in hours

of elapsed time.


Job Deadline Penalty

j dj pj

1 1 10

2 1 9

3 3 7

4 2 6

5 3 4

6 6 2

Job 1 is chosen, but job 2 is discarded, because the two together

require two hours of processing time and the deadline for job 2 is at

the end of the first hour. Jobs 3 and Jobs 4 are chosen, job 5 is

discarded, and job 6 is chosen. An optimal sequence is jobs 1, 4,3, and

6, followed by the late jobs 2 and 5.

3) A Semimatching Problem.

Let W be an mn nonnegative matrix. Suppose we wish to choosea maximum weight subset of elements, subject to the constraint that

no two elements are from the same row of the matrix. Or, in other

52

words, the problem is to

maximizei,j

wijxij

subject toj

xij 1, i = 1, ...,m

xij {0, 1}.

This semimatching problem can be solved by choosing the largest el-

ement in each row of W . Or alternatively: choose the elements one

at a time in order of size, largest first, rejecting an element only if an

element in the same row has already been chosen.

4.3 General Introduction on Compu-tational Complexity

Initiated in large measure by the seminal papers of S. A. Cook (1971)

and R. M. Karp (1972) in the area of discrete optimization.

Definition 4.3 An instance of an optimization problem consists of

a feasible set F and a cost function c : F


some instances are larger than others, and it is convenient to define

the notion of the size of an instance.

Definition 4.4 The size of an instance is defined as the number of

bits used to describe the instance, according to a prescribed format.

Given that arbitrary numbers cannot be represented in binary, this

definition is geared towards instances involving integer (or rational)

numbers. Note that any nonnegative integer r smaller or equal to U

can be written in binary as follows:

r = ak2k + ak12k1 + . . .+ a121 + a0,

where the scalars a0, . . . , ak, are 0 or 1. The number k is clearly at

most blog2Uc, since r U . We can then represent r by the binaryvector (a0, a1, . . . , ak). With an extra bit for sign, we can aslo represent

negative numbers. In other words, we can represent any integer with

absolute value less than or equal to U using at most blog2Uc+ 2 bits.Consider now an instance of a linear programming problem in stan-

drad form, i.e., an m n matrix A, an m-vector b, and an nvectorc, and assume that the magnitude of the largest element of {A,b, c}is equal to U . Since there are (mn+m+n) entries in A,b, and c, the

size of such an instance is at most

(mn+m+ n)(blog2Uc+ 2).

In fact, this count is not exactly correct: more bits will be needed

to encode flags that indicate where a number ends, and another

54

starts. However, our count is right as far as the order of magnitude is

concerned. To avoid details of this kind, we will be using instead the

order-of-magnitude notation, and we will simply say that the size of

such an instance is O(mnlogU).

Optimization problems are solved by algorithms. The running time

of an algorithm will, in general, depend on the instance to which it is

applied. Let T (n) be the worst-case running time of some algorithm

over all instances of size n, under the bit model.

Definition 4.5 An algorithm runs in polynomial time if there exists

an integer k such that T (n) = O(nk).

Fact: Suppose that an algorithm takes polynomial time under the

arithmetic model. Furthermore, suppose that on instances of size n,

any integer produced in the course of execution of the algorithm has

size bounded by a polynomial in n. Then, the algorithm runs in poly-

nomial time under the bit model as well.

The class P : A combinatorial optimization (CO) problem is in P ifit admits algorithms of polynomial complexity.

The class NP : A combinatorial problem is in NP if for all YESinstances, there exists a polynomial length certificate that can be

used to verify in polynomial time that the answer is indeed yes.


NP : e.g., verify the optimality of an LP solution.

Obviously, P NP . But,

P = NP?

Definition 4.6 Suppose that there exists an algorithm for some prob-

lem A that consists of a polynomial time computation in addition of

polynomial number of subroutine calls to an algorithm for problem B.

We then say that problem A reduces (in polynomial time) to problem

B. For short, AR= B.

In the above definition, all references to polynomiality are with re-

spect to the size of an instance of problem A.

Theorem 4.2 If AR= B and B P, then A P.

The above theorem says that if AR= B, then problem A is not

much more difficult than problem B.

For example, let us consider the following scheduling problem: a set

of jobs are to be processed on two machines where no job requires in

excess of three operations. A job may require, for example, processing

on machine one first, followed by machine two, and finally back on

machine one. Our objective is to minimize makespan, i.e., complete

the set of jobs in minimum time. Let us refer to this problem as (PJ).

56

Now, take the one-row integer program or knapsack problem that

we state in the equality form: given integers a1, a2, . . . , an and b, does

there exist a subset S {1, 2, . . . , n} such that jS aj = b? Callingthe later problem (PK), our objective is to show that (PK) polynomially

reduces to (PJ).

For a given (PK) we construct an instance of (PJ) wherein the first

n jobs require only one operation, this being on machine one. Each

has processing time aj for j = 1, 2, . . . , n. Job n + 1 possesses three

operations constrained in such a way that the first is on machine two,

the second on machine one, and the last on machine two again. The

first such operation has duration b, the second duration 1, and the

third durationn

j=1 aj b.

Clearly, one lower bound on the completion of processing time of

all jobs in this instance of (PJ) is the sum of processing times for job

n + 1, i.e.,n

j=1 aj + 1. Any feasible schedule for all jobs achieving

this makespan value must be optimal. Suppose a subset S exists such

that the knapsack problem is solvable. For (PJ) we can schedule jobs

implies by S first on machine one, followed by the second operation of

job n + 1, and complete with the remaining jobs (those not given by

S). The first and last operations for job n+1 (on machine two) finish

at times b andn

j=1 aj +1, respectively. Thus, the completion time of

this schedule isn

j=1 aj + 1.


If, conversely, there is no subset S {1, 2, . . . , n} withjS aj = bour scheduling instance would be forced to a solution like: either job

n + 1 waits before it obtains the needed unit of time on machine one

or some of jobs 1, 2, . . . , n wait to keep job n + 1 progressing. Either

way the last job will complete after timen

j=1 aj + 1.

We can conclude that the question of whether (PK) has a solution

can be reduced to asking whether the corresponding (PJ) has makespan

no greater thann

j=1 aj+1. Since (as is usually the case) the size of the

required (PJ) instance is a simple polynomial (in fact linear) function

of the size of (PK), we have a polynomial reduction. Problem (PK)

indeed reduces polynomially to (PJ).

4.4 Three Forms of a CO Problem

A CO problem: F is the feasible solution set and c : F < is a costfunction,

min c(f)

s.t. f F.

The above CO problem has three versions:

a) Optimization version: Find the optimal solution.

b) The evaluation version: Find the optimal value of c(f), f F .

c) The recognition version: Given an integer L, is there a feasible

58

solution f F such that c(f) L?.

These three type of problems are closely related in terms of algorith-

mic difficulty. In particular, the difficulty of the recognition problem

is usually a very good indicator of the difficulty of the corresponding

evaluation and optimization problems. For this reason, we can focus,

without loss of generality, on recognition problems.

Consider the following combinatorial optimization problem, called

the maximum clique problem:

Given a graph G = (V,E) find the largest subset C V such thatfor all distinct u, v C, (v, u) E.

The maximum clique problem is in NP or in short, Clique NP .

Assume that we have a procedure cliquesize which, given any graph

G, will evaluate the size of the maximum clique of G. In other words

cliquesize solves the evaluation version of the maximum clique problem.

We can then make efficient use of this routine in order to solve the

optimization version.

Step 0 . X = .Step 1. Find v V such that cliquesize(G(v)) = cliquesize(G),

where G(v) is the subgraph of G consisting of v and all its adjacent

nodes.

Step 2. X = X + v. G = G(v)\v. If G = , stop; otherwise, go to


Step 1.

We now discuss the relation between the three variants in general.

Let us assume that the cost c(f) of any feasible f F can be computedin polynomial time. It is then clear a polynomial time algorithm for

the optimization problem leads to a polynomial time algorithm for

the optimization problem. (Once an optimal solution is found, use

it to evaluate - in polynomial time, the optimal cost.) Similarly, a

polynomial time for the evaluation problem immediately translates to

a polynomial time algorithm for the recognition problem. For many

interesting problems, the converse is also true: namely a polynomial

time algorithm for the recognition problem often leads to polynomial

time algorithms for the evaluation and optimization problems.

Suppose that the optimal cost is known to take one ofM values. We

can then perform binary search and solve the evaluation problem using

dlogMe calls to an algorithm for the recognition problem. If logM isbounded by a polynomial function of the instance size (which is often

the case), and if the recognition algorithm runs in polynomial time, we

obtain a polynomial time algorithm for the evaluation problem.

We will now give another example to show how a polynomial time

evaluation algorithm can lead to a polynomial time optimization al-

gorithm by using the zero-one integer programming problem (ZOIP).

Given an instance I of ZOIP, let us consider a particular component

60

of the vector x to be optimized, say x1, and let us form a new instance

I by adding the constraint x1 = 0. We run an evaluation algorithm

on instances I and I . If the outcome is the same for both instances,

we can set x1 to zero without any loss of optimality. If the outcome

is different, we conclude that x1 should be set to 1. In either case, we

have arrived at an instance involving one less variable to be optimized.

Continuing the same way, fixing the value of one variable at a time, we

obtain an optimization algorithm whose running time is roughly equal

to the running time of the evaluation algorithm times the number of

variables.

4.5 NPCThe class coNP : A combinatorial problem is in coNP if for allNO instances, there exists a polynomial length certificate that can

be used to verify in polynomial time that the answer is indeed no.

Obviously, P coNP . But,

P = coNP?

The next definition deals with the simplest type of a reduction,

where an instance of problem A is replaced by an equivalent instance

of problem B. Rather than developing a general definition of equiv-

alence, it is more convenient to focus on the recognition problems,

that is, problems that have a binary answer (e.g., YES or NO).


NPP

co_NP

Figure 4.2: Relationships among P, NP and co-NP

Definition 4.7 Let A and B be two recognition problems. We say that

problem A transforms to problem B (in polynomial time) if there ex-

ists a polynomial time algorithm which given an instance I1 of problem

A, outputs an instance I2 of B, with the property that I1 is a YES

instance of A if and only if I2 is a YES instance of B. [AR= B.]

The class NP-hard: A problem A is NPhard if for any problemB NP , B R= A.

Theorem 4.3 Suppose that a problem C is NP-hard and that C canbe transformed (in polynomial time) to another problem D. Then D is

NP-hard.

Define a set of Boolean variables {x1, x2, . . . , xn} and let the com-plement of any of these variables xi be denoted by xi. In the language

of logic, these variables are referred to as literals. To each literal we

assign a label of true or false such that xi is true if and only if xi is

false.

62

Let the symbol denote or and the symbol denote and. We thencan write any Boolean expression in which is referred to as conjunctive

normal form, i.e., as a finite conjunction of disjunctions using each lit-

eral once at most. For example, with the set of variables {x1, x2, x3, x4}one might encounter the following conjunctive normal form expression

(x1 x2 x4) (x1 x2 x3) (x2 x4).

Each disjunctive grouping in parenthesis is referred to as a clause. The

satisfiability problem is

Given a set of literals and a conjunction of clauses defined over the

literals, is there an assignment of values to the literals for which the

Boolean expression is true?

If so, then the expression is said to be satisfiable. The Boolean expres-

sion above is satisfiable via the following assignment: x1 = x2 = x3 =

true and x4 = false. Let SAT denote the satisfiability problem and Q

be any member of NP .

Theorem 4.4 (Cook (1971)) Every problem Q NP polynomiallyreduces to SAT.

Karp (1972) showed that SAT polynomially reduces to many com-

binatorial problems.

The class NPC: A recognition problem A is NPC if


i) A NP and

ii) for any problem B NP , B R= A.

Cooks Theorem shows SAT NPC because it can be checked easilythat SAT NP .

Examples of NPC problems: ILP, ZOIP, Clique, Vertex Packing,TSP, TSP, 3-Index Assignment, Knapsack, etc.

NP-hard

NPC

P

NP

Figure 4.3: Relationships among P, NP, NPC, and NP-hard

NP-hardness is not a definite proof that no polynomial time algo-rithm exists. For all we know, it is always possible that ZIOP belongs

to P , and P = NP . Nevertheless, NP-hardness suggests that we

64

should stop searching for a polynomial time algorithm, unless we are

willing to tackle the P = NP question.

For a good guide to the theory of NPC, see1979, M. R. Garey and D. S. Johnson, Computers and Intractabil-

ity: a Guide to the Theory of NP-Completeness.1995, C.H. Papadimitriou, Computational Complexity.

ma4254 discrete optimization

Documents