clique relaxation models in networks: theory, algorithms, and applications

106
1/93 Clique Relaxation Models in Networks: Theory, Algorithms, and Applications Sergiy Butenko ([email protected]) Department of Industrial and Systems Engineering Texas A&M University College Station, TX 77843-3131

Upload: ssa-kpi

Post on 11-Nov-2014

1.297 views

Category:

Education


0 download

DESCRIPTION

AACIMP 2011 Summer School. Operational Research stream. Lecture by Sergiy Butenko.

TRANSCRIPT

Page 1: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

1/93

Clique Relaxation Models in Networks:Theory, Algorithms, and Applications

Sergiy Butenko ([email protected])

Department of Industrial and Systems EngineeringTexas A&M University

College Station, TX 77843-3131

Page 2: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

2/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 3: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

3/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 4: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

4/93

Graphs

Graph is the mathematical term for a “network” - often visualized asvertices (points, nodes) connected by edges (lines, arcs)

Page 5: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

5/93

Graph theory basics

The origins of graph theory are attributed to the Seven Bridges ofKonigsberg problem solved by Leonhard Euler in 1735.

Page 6: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

6/93

Graph theory basics

A simple, undirected graph is a pair G = (V ,E ), where V is a finite set ofvertices and E ⊆ V × V is a set of edges, with each edge defined on a pairof vertices.

6

5 1

4 2

3

V = {1, 2, 3, 4, 5, 6}

E = {(1, 2), (1, 6), (2, 3), (2, 4), (3, 4), (4, 5), (5, 6)}

Page 7: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

7/93

Graph theory basics

A graph can be represented in several ways.

� We can represent a graph G = (V ,E ) by its |V | × |V |adjacency matrix AG = [aij ], such that

aij =

{1, if (vi , vj) ∈ E ,0, otherwise.

� Another way of representing a graph is by its adjacency lists,where for each vertex v ∈ V we record the set of verticesadjacent to it.

Page 8: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

8/93

Graph theory basics

� A multigraph is a graph with repeated edges.

� A directed graph, or digraph, is a graph with directions assignedto its edges.

����

1

����

2

����

3

����

4

����

6

����

5

����

7

����

8

����

9

����

10-�������>

ZZZZZZZ~

PPPPPPq@@@@@@@R

PPPPPPq

������1

������1

��������

PPPPPPq

������1

��������

PPPPPPq

������1

@@@@@@@R

-

ZZZZZZZ~

�������>

Page 9: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

9/93

Graph theory basics

� If G = (V ,E ) is a graph and e = (u, v) ∈ E , we say that u isadjacent to v and vice-versa. We also say that u and v areneighbors.

� Neighborhood N(v) of a vertex v is the set of all its neighborsin G : N(v) = {j : (i , j) ∈ E}.

� If G = (V ,E ) is a graph and e = (u, v) ∈ E , we say that e isincident to u and v .

� The degree of a vertex v is the number of its incident edges.

Page 10: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

10/93

Graph theory basics

I G = (V ,E ) is a simple undirected graph, V = {1, 2, . . . , n}.

I G = (V ,E ), is the complement graph of G = (V ,E ), whereE = {(i , j) | i , j ∈ V , i 6= j and (i , j) /∈ E}.

I For S ⊆ V , G (S) = (S ,E ∩ S × S) the subgraph induced by S .

Page 11: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

11/93

Cliques and independent sets

I A subset of vertices C ⊆ V is called a clique if G (C ) is acomplete graph.

I A subset I ⊆ V is called an independent set (stable set, vertexpacking) if G (I ) has no edges.

I C is a clique in G if and only if C is an independent set in G .

Page 12: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

12/93

Cliques and independent sets

I A clique (independent set) is said to be

– maximal, if it is not a subset of any larger clique (independentset);

– maximum, if there is no larger clique (independent set) in thegraph.

I ω(G ) – the clique number of G .

I α(G ) – the independence (stability) number of G .

Page 13: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

13/93

Cliques and independent sets

1

5 2

4 3

1

5 2

4 3

Complement

{1,2,5} : maximal clique {1,2,5} : maximal independent set

{1,4} : maximal independent set {1,4} : maximal clique

{2,3,4,5} : maximum clique {2,3,4,5} : maximum independent set

GG

Page 14: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

14/93

Social networks

A social network is described by G = (V ,E ) where V is the set of“actors” and E is the set of “ties”.

I actors are people and a tie exists if two people know each other.

I actors are wire transfer database records and a tie exists if tworecords have the same matching field.

I actors are telephone numbers and a tie exists if calls were madebetween them.

Page 15: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

15/93

Social network analysis

“Popular” social networks:

I Kevin Bacon Number

I Erdos Number

I Six degrees of separation and small world phenomenon inacquaintance networks

Page 16: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

16/93

Cohesive subgroups

I Cohesive subgroups are “tightly knit groups” in a socialnetwork.

I Social cohesion is often used to explain and develop sociologicaltheories.

I Members of a cohesive subgroup are believed to shareinformation, have homogeneity of thought, identity, beliefs,behavior, even food habits and illnesses.

Page 17: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

17/93

Applications

I Acquaintance Networks - criminal network analysis

I Wire Transfer Database Networks - detecting moneylaundering

I Call Networks - organized crime detection

I Protein Interaction Networks - predicting protein functions

I Gene Co-expression Networks - detecting network motifs

I Metabolic Networks - identifying metabolic pathways

I Stock Market Networks - stock portfolios

I Internet Graphs - information search and retrieval

I Wireless and telecommunication networks - clustering androuting

Page 18: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

18/93

Random graph models

Uniform random graphs (Erdos & Renyi):

I G (n, p): n vertices, each edge exists with a probability p;

I G (n,M): all graphs with n vertices and M edges are assignedthe same probability

Power-law random graphs:

I the probability that a vertex has a degree k is proportional tok−β

Random graphs with a given degree sequence

Page 19: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

19/93

Scale-free graphs

I Scale-free nature : A property observed in many natural andman-made networks - biological networks, social networks,internet graphs

I Degree distribution follows a power law - P(k) ∝ k−β

I Fundamentally different from classical random graph model ofErdos and Renyi

I Principle of preferential attachment underlies the growth andevolution of such networks (i.e. the rich get richer!)

Page 20: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

20/93

Science - co-authorship networkImage source:http://www.jeffkennedyassociates.com:16080/connections/concept/image.html

Page 21: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

21/93

Internet colored by IP addressesImage source:http://www.jeffkennedyassociates.com:16080/connections/concept/image.html

Page 22: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

22/93

H. Pylori - protein interactionsImage generated using Graphviz

Page 23: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

23/93

Properties of cohesive subgroups

Some desirable properties of a cohesive subgroup are:

I Familiarity (degree);

I Reachability (distance, diameter);

I Robustness (connectivity);

I Density (edge density).

Earliest models of cohesive subgroups were cliques (Luce and Perry,1949), representing an “ideal” model of a cohesive subgroup.

Page 24: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

24/93

Clique relaxations: k-cliques and k-clubs

A k-clique is a subset of vertices C such that for every i , j ∈ C ,d(i , j) ≤ k .

A k-club is a subset of vertices D such that diam(G [D]) ≤ k .

6

5 1

4 2

3

I {2,3,4} is a 1-club ... the“regular” clique

I {1,2,4,5,6} is a 2-club

I {1,2,3,4,5} is a 2-clique butNOT a 2-club

I maximality of a 2-club is harderto test

Page 25: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

24/93

Clique relaxations: k-cliques and k-clubs

A k-clique is a subset of vertices C such that for every i , j ∈ C ,d(i , j) ≤ k .

A k-club is a subset of vertices D such that diam(G [D]) ≤ k .

6

5 1

4 2

3

I {2,3,4} is a 1-club ... the“regular” clique

I {1,2,4,5,6} is a 2-club

I {1,2,3,4,5} is a 2-clique butNOT a 2-club

I maximality of a 2-club is harderto test

Page 26: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

24/93

Clique relaxations: k-cliques and k-clubs

A k-clique is a subset of vertices C such that for every i , j ∈ C ,d(i , j) ≤ k .

A k-club is a subset of vertices D such that diam(G [D]) ≤ k .

6

5 1

4 2

3

I {2,3,4} is a 1-club ... the“regular” clique

I {1,2,4,5,6} is a 2-club

I {1,2,3,4,5} is a 2-clique butNOT a 2-club

I maximality of a 2-club is harderto test

Page 27: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

24/93

Clique relaxations: k-cliques and k-clubs

A k-clique is a subset of vertices C such that for every i , j ∈ C ,d(i , j) ≤ k .

A k-club is a subset of vertices D such that diam(G [D]) ≤ k .

6

5 1

4 2

3

I {2,3,4} is a 1-club ... the“regular” clique

I {1,2,4,5,6} is a 2-club

I {1,2,3,4,5} is a 2-clique butNOT a 2-club

I maximality of a 2-club is harderto test

Page 28: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

24/93

Clique relaxations: k-cliques and k-clubs

A k-clique is a subset of vertices C such that for every i , j ∈ C ,d(i , j) ≤ k .

A k-club is a subset of vertices D such that diam(G [D]) ≤ k .

6

5 1

4 2

3

I {2,3,4} is a 1-club ... the“regular” clique

I {1,2,4,5,6} is a 2-club

I {1,2,3,4,5} is a 2-clique butNOT a 2-club

I maximality of a 2-club is harderto test

Page 29: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

25/93

k-plex

Definition

A subset of vertices S is said to be a k-plex if the minimum degreein the induced subgraph δ(G [S ]) ≥ |S | − k

i.e. every vertex in G [S] has degree at least |S| − k .

5 4

1

3

2

6

I {3,4,5,6} is a 1-plex ... the“regular” clique

I {1,3,4,5,6} is a 2-plex(and NOT a 1-plex)

I {1,2,3,4,5,6} is a 3-plex(and NOT a 2-plex)

Page 30: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

25/93

k-plex

Definition

A subset of vertices S is said to be a k-plex if the minimum degreein the induced subgraph δ(G [S ]) ≥ |S | − k

i.e. every vertex in G [S] has degree at least |S| − k .

5 4

1

3

2

6

I {3,4,5,6} is a 1-plex ... the“regular” clique

I {1,3,4,5,6} is a 2-plex(and NOT a 1-plex)

I {1,2,3,4,5,6} is a 3-plex(and NOT a 2-plex)

Page 31: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

25/93

k-plex

Definition

A subset of vertices S is said to be a k-plex if the minimum degreein the induced subgraph δ(G [S ]) ≥ |S | − k

i.e. every vertex in G [S] has degree at least |S| − k .

5 4

1

3

2

6

I {3,4,5,6} is a 1-plex ... the“regular” clique

I {1,3,4,5,6} is a 2-plex(and NOT a 1-plex)

I {1,2,3,4,5,6} is a 3-plex(and NOT a 2-plex)

Page 32: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

25/93

k-plex

Definition

A subset of vertices S is said to be a k-plex if the minimum degreein the induced subgraph δ(G [S ]) ≥ |S | − k

i.e. every vertex in G [S] has degree at least |S| − k .

5 4

1

3

2

6

I {3,4,5,6} is a 1-plex ... the“regular” clique

I {1,3,4,5,6} is a 2-plex(and NOT a 1-plex)

I {1,2,3,4,5,6} is a 3-plex(and NOT a 2-plex)

Page 33: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

26/93

co-k-plex

Definition

A subset of vertices S is a co-k-plex if the maximum degree in theinduced subgraph ∆(G [S ]) ≤ k − 1.

i.e. degree of every vertex in G [S ] is at most k − 1.

S is a co-k-plex in G if and only if S is a k-plex in the complementgraph G .

5 4

1

3

2

6

5 4

1

3

2

6

3-plex Co-3-plex

Page 34: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

27/93

Structural Properties of a k-Plex

If G is a k-plex then

1. Every subgraph of G is a k-plex;

2. If k < n+22 then diam(G ) ≤ 2;

3. κ(G ) ≥ n − 2k + 2.

I k-plexes for “small” k values, guarantee reachability andconnectivity while relaxing familiarity.

Page 35: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

28/93

Known clique relaxation models

I k-clique (distance)

I k-club (diameter)

I k-plex (degree)

I k-core (degree)

I k-connected subgraph (connectivity)

I γ-quasi-clique (density)

I (γ, λ)-quasi-clique (density, degree)

I R-robust k-club (connectivity, diameter)

Page 36: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

29/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 37: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

30/93

Alternative clique definitions

Distance-based

Vertices are distance one away from each other.

Diameter-based

Vertices induce a subgraph of diameter one.

Domination-based

Every one vertex forms a dominating set.

Page 38: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

31/93

Alternative clique definitions

Degree-based

Each vertex is connected to all other vertices.

Connectivity-based

All vertices need to be removed to disconnect the induced subgraph.

Density-based

Vertices induce a subgraph that has all possible edges.

Page 39: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

32/93

Order of a clique relaxation

We introduce a hierarchy of clique relaxation models by definingtheir order as follows.

I Clique is the only clique relaxation of order 0.

I Clique relaxations of order 1 relax one of the elementaryproperties (distance, diameter, ...) used to define clique.

I Clique relaxations of order 2 relax two of the elementaryclique-defining properties.

I ...

Page 40: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

33/93

Nature of a clique relaxation

Parametric clique relaxations

relax a parameter (“one”, “all”) describing an elementaryclique-defining property.

Partial clique relaxations

require only a (high) fraction of elements to satisfy an elementaryclique-defining property or a parametric clique-relaxation property.

Page 41: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

34/93

Types of parametric clique relaxations

Parametric clique relaxation of type 1

Replace “one” with “(at most) k”, k > 1, in a definition of clique.

Parametric clique relaxation of type 2

Replace “one” with “(at most) all but k”, k > 1, in a definition of clique.

Parametric clique relaxation of type 1a

Replace “all” with “(at least) k”, k > 1, in a definition of clique.

Parametric clique relaxation of type 2a

Replace “all” with “(at least) all but k”, k > 1, in a definition of clique.

Page 42: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

35/93

First-order parametric

clique relaxations of type 1

Replace “one” with “(at most) k”, k > 1.

Distance-based: k-clique

Vertices are distance at most k away from each other.

Diameter-based: k-club

Vertices induce a subgraph of diameter at most k.

Domination-based: k-plex

Every k vertices form a dominating set.

Page 43: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

36/93

First-order parametric

clique relaxations of type 2

Replace “one” with “(at most) all but k”, k > 1

Distance-based: N/A

Vertices in S are distance at most |S | − k away from each other.

Diameter-based: N/A

Vertices induce a subgraph G [S ] of diameter at most |S | − k .

Domination-based: N/A

Every |S | − k vertices form a dominating set.

Page 44: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

37/93

First-order parametric

clique relaxations of type 1a

Replace “all” with “(at least) k”, k > 1

Degree-based: k-core

Each vertex is connected to at least k other vertices.

Connectivity-based: k-connected subgraph

At least k vertices need to be removed to disconnect the inducedsubgraph.

Density-based: N/A

Vertices induce a subgraph that has at least k edges.

Page 45: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

38/93

First-order parametric

clique relaxations of type 2a

Replace “all” with “(at least) all but k”, k > 1.

Degree-based: k-plex

Each vertex is connected to at least all but k other vertices.

Connectivity-based: N/A

At least all but k vertices need to be removed to disconnect theinduced subgraph.

Density-based: N/A

Vertices induce a subgraph that has at least all but k edges.

Page 46: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

39/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 47: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

40/93

Basics of the complexity theory

� Given a combinatorial optimization problem, a natural questionis:

is this problem “easy” or “hard”?

� Methods for easy and hard problems are fundamentallydifferent.

� How do we distinguish between “easy” and “hard” problems?

Page 48: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

41/93

“Easy” problems

� By easy or tractable problems we mean the problems that canbe solved in time polynomial with respect to their size.

� We also call such problems polynomially solvable and denotethe class of polynomially solvable problems by P.

� Sorting� Minimum weight spanning tree� Linear programming

Page 49: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

42/93

Defining “hard” problems

� How do we define “hard” problems?

� How about defining hard problems as all problems that are noteasy, i.e., not in P?

� Then some of the problems in such a class could be TOO hard– we cannot even hope to be able to solve them.

� We want to define a class of hard problems that we may beable to solve, if we are lucky (say, we may be able to guess thesolution and check that it is indeed correct).

Page 50: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

43/93

Defining “hard” problems

� Let us take the maximum clique problem as an example and tryto guess a solution.

� We can pick some random clique, compute its size, but how dowe know if this solution is optimal?

� Indeed, this is as difficult to verify as to solve the maximumclique problem – it is unlikely that we can count on luck in thiscase...

� How about transforming our optimization problem to such aform, where instead of finding an optimal solution the questionwould be “is there a solution with objective value ≥ s, where sis a given integer?

Page 51: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

44/93

Three versions of optimization problems

Consider a problem

min f (x) subject to x ∈ X .

I Optimization version: find x from X that maximizes f (x);Answer: x∗ maximizes f (x)

I Evaluation version: find the largest possible f (x);Answer: the largest possible value for f (x) is f ∗

I Recognition version: Given f ∗, does there exist an x such thatf (x) ≥ f ∗?Answer: “yes” or “no”

Page 52: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

45/93

Recognition problems

� Recognition problems can still be undecidable.

Halting problem: Given a computer program with itsinput, will it ever halt?

� Recall the “luck factor”: if we pick a random feasible solutionand it happens to give “yes” answer, then we solved theproblem in polynomial time.

Max clique: Randomly pick a solution (a clique). If itssize is ≥ s (which we can verify in polynomial time),then obviously the answer is “yes”. This clique can beviewed as a certificate proving that this is indeed a yesinstance of max clique.

Page 53: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

46/93

Class NP

� We only consider problems for which any yes instance thereexists a concise (polynomial-size) certificate that can be verifiedin polynomial time.

� We call this class of problems nondeterministic polynomial anddenote it by NP.

Page 54: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

47/93

P vs NP

� Note that any problem from P is also in NP (i.e., P ⊆ NP),so there are easy problems in NP.

� So, are there “hard” problems in NP, and if there are, how dowe define them?

� We don’t know if P = NP, but “most” people believe thatP 6= NP.

� An “easy” way to make $1,000,000!http://www.claymath.org/millennium/P vs NP/

� We can call a problem hard if the fact that we can solve thisproblem would mean that we can solve any other problem incomparable amount of time.

Page 55: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

48/93

Polynomial reducibility

� Reduce π1 to π2: if we can solve π2 fast, then we can solve π1

fast, given that the reduction is “easy”.

� Polynomial reduction from π1 to π2 requires existence ofpolynomial-time algorithms

1. A1 converts an input for π1 into an input for π2;2. A2 converts an output for π2 into output for π1.

� Transitivity: If π1 is polynomially reducible to π2 and π2 ispolynomially reducible to π3 then π1 is polynomially reducibleto π3.

Page 56: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

49/93

NP-complete problems

� A problem π is called NP-complete if

1. π ∈ NP;2. Any problem from NP can be reduced to π in polynomial time.

� A problem π is called NP-hard if any problem from NP can bereduced to π in polynomial time. (no π ∈ NP requirement)

� Due to transitivity of polynomial reducibility, in order to showthat a problem π is NP-complete, it is sufficient to show that

1. π ∈ NP;2. There is an NP-complete problem π′ that can be reduced to π

in polynomial time.

To use this observation, we need to know at least oneNP-complete problem...

Page 57: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

50/93

Satisfiability (SAT) problem

I A Boolean variable x is a variable that can assume only the valuestrue and false.

I Boolean variables can be combined to form Boolean formulas usingthe following logical operations:

1. Logical AND (∧ or ·) -conjunction2. Logical OR (∨ or +) - disjunction3. Logical NOT (x)

I A clause is Cj =kj∨

p=1yjp , where a literal yjp is xr or xr for some r .

I Conjunctive normal form (CNF):F =m∧

j=1

Cj , where Cj is a clause.

Page 58: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

51/93

Satisfiability problem

I A CNF F is called satisfiable if there is an assignment ofvariables such that F = 1 (TRUE ).

I Satisfiability (SAT) problem: Given m clauses C1, . . . ,Cm

involving the variables x1, . . . , xn, is the CNF

F =m∧

j=1

Cj ,

satisfiable?

Theorem (Cook, 1971)

SAT is NP-complete.

Page 59: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

52/93

“Best” approximation algorithms and

heuristics

� For some problems there are hardness of approximation resultsstating that the problem is hard to approximate within a certainfactor.

� For example, the k-center problem is hard to approximatewithin a factor better than 2.

� Then any polynomial-time algorithm approximating thek-center problem within the factor of 2 can be considered the“best” approximation algorithm for this problem.

Page 60: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

53/93

“Best” approximation algorithms and

heuristics

� However, some problems are even harder to approximate. Forexample, the maximum clique is hard to approximate within afactor n1−ε for any positive ε.

� In this case, by the “best” heuristic we could mean a heuristicthat cannot be provably outperformed by any otherpolynomial-time algorithm (unless P = NP).

Page 61: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

54/93

Recognizing the gap between k-club and

l -club numbers

Theorem

Let positive integer constants k and l, l < k be given. The problemof checking whether ωl(G ) = ωk(G ) is NP-hard.

Note thatω(G ) ≤ ∆(G ) + 1 ≤ ωk(G )

and observe that we can easily check whether ω(G ) = ∆(G ) + 1.

Hence, it is NP-hard to check whether ωk(G ) = ∆(G ) + 1.

Page 62: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

55/93

“Best” heuristics for k-club/clique

Corollary

Let k be a fixed integer, k ≥ 2. Unless P = NP, there cannot be apolynomial time algorithm that finds a k-club of size greater than∆(G ) + 1 whenever such a k-club exists in the graph.

Page 63: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

56/93

Protein Interaction Networks

Page 64: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

57/93

A max 2-club/clique of S. Cerevisiae.

Page 65: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

58/93

A max 2-club/clique of H. Pylori.

Page 66: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

59/93

A max 3-clique/club of S. Cerevisiae

Page 67: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

60/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 68: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

61/93

Size of clique relaxations in a graph

Question: How large could a k-clique (k-club, k-plex) be for agraph with a small clique number?

Answers:

I k-club: ω(K1,n) = 2, but ωk(K1,n) = n + 1, could be arbitrarylarge

I k-clique: same as k-club

I k-plex:

Lemma

For any graph G and a positive integer k,

ωk(G ) ≤ kω(G ) and αk(G ) ≤ kα(G )

Page 69: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

61/93

Size of clique relaxations in a graph

Question: How large could a k-clique (k-club, k-plex) be for agraph with a small clique number?Answers:

I k-club: ω(K1,n) = 2, but ωk(K1,n) = n + 1, could be arbitrarylarge

I k-clique: same as k-club

I k-plex:

Lemma

For any graph G and a positive integer k,

ωk(G ) ≤ kω(G ) and αk(G ) ≤ kα(G )

Page 70: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

61/93

Size of clique relaxations in a graph

Question: How large could a k-clique (k-club, k-plex) be for agraph with a small clique number?Answers:

I k-club: ω(K1,n) = 2, but ωk(K1,n) = n + 1, could be arbitrarylarge

I k-clique: same as k-club

I k-plex:

Lemma

For any graph G and a positive integer k,

ωk(G ) ≤ kω(G ) and αk(G ) ≤ kα(G )

Page 71: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

62/93

Upper bound on the quasi-clique number

Proposition

The γ-clique number ωγ(G ) of a graph G with n vertices and medges satisfies the following inequality:

ωγ(G ) ≤ γ +√γ2 + 8γm

2γ. (1)

Moreover, if a graph G is connected then

ωγ(G ) ≤γ + 2 +

√(γ + 2)2 + 8(m − n)γ

2γ. (2)

Page 72: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

63/93

Relation between ωγ(G ) and ω(G )

Proposition

The γ-clique number ωγ(G ) and the clique number ω(G ) of graphG satisfy the following inequalities:

ω(G )− 1

ω(G )≤ ωγ(G )− 1

ωγ(G )≤ 1

γ

ω(G )− 1

ω(G ). (3)

Corollary

If γ > 1− 1ω(G) then

ωγ(G ) ≤ ω(G )γ

1− ω(G ) + ω(G )γ. (4)

Page 73: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

64/93

Relation between omegaγ(G ) and ω(G )

Table: The value of upper bound (4) on γ-clique number withγ = 0.95, 0.9, 0.85 for graphs with small clique number.

ω(G ) 1− 1ω(G) 0.95 0.9 0.85

3 0.667 3.35 3.86 4.644 0.75 4.75 6 8.55 0.8 6.33 9 176 0.83 8.14 13.5 517 0.86 10.23 21 –8 0.88 12.67 36 –9 0.89 15.55 81 –

10 0.9 19 – –

Page 74: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

65/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 75: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

66/93

Math programming formulations

I Consider a simple undirected graph G = (V ,E ) with n vertices

I Let A = [aij ]ni ,j=1 be the adjacency matrix of G

I Let x = (x1, . . . , xn) be a 0-1 vector with xi = 1 if node ibelongs to Gs , and xi = 0 otherwise.

Page 76: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

67/93

Math programming formulations

I GS is a γ-clique if it has at least γ|GS |(|GS | − 1)/2 edges

I We have: |GS | =n∑

i=1xi

I This number of edges can be expressed in terms of vector x as:

12γ

n∑i=1

xi

(n∑

i=1xi − 1

)= 1

(n∑

i ,j=1xixj −

n∑i=1

xi

)

= 12γ

(n∑

i ,j=1;i 6=j

xixj +n∑

i=1x2i −

n∑i=1

xi

)= 1

2γn∑

i ,j=1;i 6=j

xixj

I The number of edges in GS is 12

n∑i ,j=1

aijxixj

Page 77: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

68/93

Math programming formulations

We obtain the following 0-1 problem with one quadratic constraint:

maxn∑

i=1

xi

subject to:n∑

i ,j=1

aijxixj ≥ γn∑

i ,j=1;i 6=j

xixj

Page 78: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

69/93

Math programming formulations

Define wij = xixj . The constraint wij = xixj is equivalent to

wij ≤ xi ,wij ≤ xj ,wij ≥ xi + xj − 1 .

Linearized formulation: maxn∑

i=1xi

subjectto :n∑

i ,j=1

aijwij ≥ γn∑

i ,j=1;i 6=j

wij

wij ≤ xi ,wij ≤ xj ,wij ≥ xi + xj − 1 .

wij , xi ∈ {0, 1}, ∀i < j = 1, . . . , n

O(n2) 0-1 variables, O(n2) constraints

Page 79: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

70/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 80: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

71/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 81: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

72/93

Clique relaxation and node deletion problems

Clique

relaxation

problems

Node deletion

problems

with heredity

Node deletion problems

Page 82: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

73/93

Node deletion problems

I A graph property Π is said to be hereditary on inducedsubgraphs, if G is a graph with property Π and the deletion ofany subset of vertices does not produce a graph violating Π

I Π is said to be nontrivial if it is true for a single vertex graphand is not satisfied by every graph

I Π is said to be interesting if there are arbitrarily large graphssatisfying Π

Page 83: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

74/93

Yannakakis theorem

The maximum Π problem is to find the largest order inducedsubgraph that does not violate property Π

Theorem (Yannakakis, 1978)

The maximum Π problem for nontrivial, interesting graph propertiesthat are hereditary on induced subgraphs is NP-hard.

Page 84: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

75/93

Max weight node deletion problem

Given

I a simple, undirected graph G = (V ,E ),

I positive weights w(v) for each v ∈ V ,

I and a nontrivial, interesting and hereditary (on inducedsubgraphs) property Π.

Find

ν(G ) = max{w(P) : P ⊆ V , G [P] satisfies Π},

where w(P) =∑

v∈P w(v) and G [P] is the subgraph induced by P.

Page 85: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

76/93

Generalizing max clique algorithms

Some of the most practically effective algorithms for the maximumclique problem rely on the fact that clique is hereditary on inducedsubgraphs.

I Carraghan and Pardalos (1990) - used as DIMACS benchmark

I Ostergard (2002)

We use this observation to develop a generalized algorithm for theminimum weight node deletion problem (GAMWNDP).

Page 86: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

77/93

GAMWNDP

I Order vertices V = {v1, v2, · · · , vn}, define Si = {vi , vi+1, · · · , vn}I We compute the function c(i) that is the weight of the maximum

induced subgraph with property Π in G [Si ].

I Obviously, c(n) = w(vn) and c(1) = ν(G ).

I For the unweighted case with w(vi ) = 1, i = 1, . . . , n we have

c(i) =

{c(i + 1) + 1, if the solution must contain vi

c(i + 1), otherwise

I For the weighted case, c(i) > c(i + 1) implies that vi belongs toevery optimal solution, and c(i) ≤ c(i + 1) + w(vi ).

Page 87: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

78/93

GAMWNDP

I We compute the value of c(i) starting from c(n) and down toc(1), and in each major iteration work with a current feasiblesolution P, a candidate set C , and an incumbent solution S .

I Pruning occurs whenI w(C ) + w(P) < w(S) orI c(i) + w(P) < w(S), where i = min{j : vj ∈ C}.

I Problem-specific features:

I candidate set generation;I vertex ordering.

Page 88: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

79/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 89: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

80/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 90: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

81/93

k-Core and k-community

k-Core

A sub-graph G ′ of G such that each node has degree greater than orequal to k.

k-Community

A subgraph G ′ of G such that each (i , j) ∈ E ′ if (i , j) ∈ E and i , jhave at least k common neighbors in G ′.

Page 91: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

82/93

k-Core exampleExample - k-Core

(a) Original Graph (b) Max 3-core

Each node in the max 3-core is connected to at least 3 othervertices.

Anurag Verma, Sergiy Butenko, Texas A&M University 4/ 20

Page 92: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

83/93

k-Community exampleExample - k-Community

(c) Original Graph (d) Max 2-community

Any two vertices in the max 2-community are connected only ifthey have at least 2 mutual neighbors.

Anurag Verma, Sergiy Butenko, Texas A&M University 5/ 20

Page 93: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

84/93

Scale reduction approaches

I Reduce problem size by ignoring certain parts of the graphwhich we are certain would not be in the graph.

I Most commonly used technique: Peeling

1. Remove all the vertices with degree ≤ k − 1.2. If no vertices removed, stop. Else, redo step 1 using the

obtained graph.

I Peeling is nothing but finding the k-core!

Page 94: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

85/93

Scale reduction approaches

Property 1

A clique of size k is both a (k − 1)-core and (k − 2)-community.

Property 2

If the k-Core of G is empty, then ω(G ) < k + 1.

Property 3

If the k-Community of G is empty, then ω(G ) < k + 2.

Page 95: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

86/93

k-Core upper boundk-Core Upper Bound

Find smallest k1 such that k1-Core is empty. UB = k1 = 5.

(e) Original Graph (f) 4-core

(g) Original Graph (h) 5-core

Anurag Verma, Sergiy Butenko, Texas A&M University 10/ 20

Find smallest k1 such that k1-Core is empty. UB = k1 = 5.

Page 96: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

87/93

k-Community upper bound

k-Community Upper Bound

Find smallest k2 such that k2-Comm is empty.

(i) Original Graph (j) Intermediate (k) 2-comm

3-comm is empty. Thus UB = k2 + 1 = 4.

Anurag Verma, Sergiy Butenko, Texas A&M University 11/ 20

Find smallest k2 such that k2-Comm is empty. UB = k2 + 1 = 4.

Page 97: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

88/93

Finding the least k2

A simple binary search strategy:

Step 0 Set ku = n − 2, kl = 0, k2 = n − 2

Step 1 If k2-Comm(G) is empty, ku = k2. Else, kl = k2.

Step 2 If ku − kl ≤ 1, set k2 = ku, STOP. Else, setk2 = (ku + kl)/2, go to Step 1.

Page 98: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

89/93

The call graph

Abello, Pardalos, and Resende, 1999:

I 20-day period: 290 million vertices and 4 billion edges.

I one-day call graph: 53,767,087 vertices and over 170 millions ofedges.

I 3,667,448 connected components; 302,468 of them with morethan 3 vertices.

I A giant connected component with 44,989,297 vertices.

Page 99: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

90/93

Upper bounds on SNAP databaseUpper Bounds obtained on SNAP Graphs

Graph n m k-Core UB k-Comm UBUB n’ UB n”

WikiTalk 2394385 4659565 132 700 52 237cit-Patents 3774768 16518947 65 106 35 83Email-EuAll 265214 364481 38 292 19 62Cit-HepPh 34546 420877 31 40 24 36Cit-HepTh 27770 352285 38 52 29 48Slashdot0811 77360 469180 55 129 34 87Slashdot0902 82168 504230 56 134 35 96soc-Epinions1 75879 405740 68 486 32 61Wiki-Vote 7115 100762 54 336 22 50p2p-Gnutella31 62586 147892 7 1004 4 57p2p-Gnutella04 10876 39994 8 365 4 12p2p-Gnutella24 26518 65369 6 7480 4 41p2p-Gnutella25 22687 54705 6 6091 4 25p2p-Gnutella30 36682 88328 8 14 4 42web-Stanford 281903 1992636 72 387 61 126web-NotreDame 325729 1090108 156 1367 155 155web-Google 875713 4322051 45 48 44 48web-BerkStan 685230 6649470 202 392 201 392Amazon0601 403394 2443408 11 32886 11 5361Amazon0505 410236 2439437 11 32632 11 4878Amazon0302 262111 899792 7 286 7 105Amazon0312 400727 2349869 11 27046 11 4534

Comparison of the upper bounds obtained by a k-core scheme vs ak-comm scheme.

Anurag Verma, Sergiy Butenko, Texas A&M University 13/ 21

Page 100: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

91/93

Upper bounds on SNAP databaseUpper Bounds obtained on SNAP Graphs

Graph n m k-Core UB k-Comm UBUB n’ UB n”

WikiTalk 2394385 4659565 132 700 52 237cit-Patents 3774768 16518947 65 106 35 83Email-EuAll 265214 364481 38 292 19 62Cit-HepPh 34546 420877 31 40 24 36Cit-HepTh 27770 352285 38 52 29 48Slashdot0811 77360 469180 55 129 34 87Slashdot0902 82168 504230 56 134 35 96soc-Epinions1 75879 405740 68 486 32 61Wiki-Vote 7115 100762 54 336 22 50p2p-Gnutella31 62586 147892 7 1004 4 57p2p-Gnutella04 10876 39994 8 365 4 12p2p-Gnutella24 26518 65369 6 7480 4 41p2p-Gnutella25 22687 54705 6 6091 4 25p2p-Gnutella30 36682 88328 8 14 4 42web-Stanford 281903 1992636 72 387 61 126web-NotreDame 325729 1090108 156 1367 155 155web-Google 875713 4322051 45 48 44 48web-BerkStan 685230 6649470 202 392 201 392Amazon0601 403394 2443408 11 32886 11 5361Amazon0505 410236 2439437 11 32632 11 4878Amazon0302 262111 899792 7 286 7 105Amazon0312 400727 2349869 11 27046 11 4534

Comparison of the number of nodes in the residual graph after thescale reduction.

Anurag Verma, Sergiy Butenko, Texas A&M University 13/ 20

Page 101: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

92/93

Upper bounds on random graphs

Uniform random graphs with 1000 nodes.

Bollobas-Erdos estimate: 2 log(n)/ log(1/p).

Page 102: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

93/93

Outline

IntroductionGraph Theory Basics

Taxonomy of Clique Relaxations

TheoryComplexity IssuesAnalytical boundsMathematical programming formulations

AlgorithmsNode Deletion Problems with HeredityScale Reduction Approaches

Conclusion and References

Page 103: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

94/93

Conclusion

I We propose a taxonomy of clique relaxation models

I opens new avenues for systematic study of clique relaxationsI covers known clique relaxation modelsI unveils new models of potential interest

I An effective algorithm is provided for node deletion problemsthat are nontrivial, interesting, and hereditary on inducedsubgraphs

I state-of-the-art exact approach to max k-plexI can be used to solve many other important problems

I A scale-reduction approach based on the concept ofk-community is proposed

I outperforms the peeling approach based on k-core

Page 104: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

95/93

References

B. Balasundaram, S. Butenko, I. V. Hicks.Clique relaxations in social network analysis: The maximum k-plexproblem.Operations Research, 59: 133–142, 2011.

S. Trukhanov, B. Balasundaram, and S. Butenko.Exact algorithms for hard node deletion problems in networks.Submitted.

S. Butenko, J. Pattillo, and N. Youssef.A taxonomy of clique relaxation models in networks.In preparation.

A. Verma and S. Butenko.A new scale-reduction technique for the maximum clique and relatedproblems.In preparation.

Page 105: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

96/93

References

J. Pattillo, A. Veremyev, S. Butenko, and V. Boginski.On the maximum quasi-clique problem.Submitted.

S. Butenko, S. Kahruman, and O. Prokopyev.On “provably best” heuristics for hard optimization problems.In preparation.

Page 106: Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

97/93

Thank you!