construction of near-optimal vertex clique covering for ...kvasnicka/seminar_of_ai/... ·...
TRANSCRIPT
Construction of Near-optimal Vertex Clique Coveringfor Real-world Networks
David Chalupa
Institute of Applied InformaticsFaculty of Informatics and Information Technologies
Slovak University of Technology
October 14, 2013
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 1 / 48
Overview
the vertex clique covering problem (CCP)
some properties of real-world networks
motivation and relations to different fields
the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set
experimental results
(a sketch of a few) theoretical results
conclusions and discussion
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 2 / 48
Clique Covering and Community Detection
Figure : Clique covering and clustering: Illustration on a small social network.
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 3 / 48
The (Vertex) Clique Covering Problem (CCP) - Illustration
Figure : Two solutions to CCP in a small sparse uniform random graph (on theleft) and in a small sample of a social network (on the right).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 4 / 48
The (Vertex) Clique Covering Problem (CCP) - Definition
objective: minimize k ≤ |V |, such that there are pairwise disjointclasses V1,V2, ...,Vk ⊂ V , which:- cover the whole vertex set, i.e. V1 ∪ V2 ∪ ... ∪ Vk = V and- induce cliques, i.e. ∀i = 1..k d(G (Vi )) = 1,
where d(G ) = 2|E ||V |(|V |−1) is the density of G
CCP is NP-hard : the k-fixed decision problem - NP-complete (Karp,1972)
equivalency : clique covering of G with k cliques - graph coloring of Gwith k colors
similarity to max clique / independent set: adaptations of heuristicsbetween these problems were proposed in the past (Gendreau et al,1993)
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 5 / 48
Clique Covering Problem (CCP)
Minimizing the number of partitions under the assumption that thepartitions induce cliques.
Exact solution is possible, although computationally intensealgorithms are needed (Karp, 1972).
However, does it really hold that CCP in social networks is as hard asfor general graphs?Maybe not...
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 6 / 48
Overview
the vertex clique covering problem (CCP)
some properties of real-world networks
motivation and relations to different fields
the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set
experimental results
(a sketch of a few) theoretical results
conclusions and discussion
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 7 / 48
Sparseness / Asymptotical Sparsenessa class of graphs is called asymptotically sparse if and only if for nvertices and m(n) edges (as a function of the number of vertices), itholds that:m(n) ≺
(n2
)≡ δ(n) ≺ n,
where δ(n) = 2m(n)/n is the average degree of a vertex
2.5
3
3.5
4
4.5
5
5.5
0 5000 10000 15000 20000
δ(n)
-0.01
0
0.01
0.02
0.03
0.04
0.05
0 5000 10000 15000 20000
δ'(n)
Figure : The average degree δ(n) in a growing sample from a Slovak socialnetwork and its difference function δ′(n).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 8 / 48
Degree Distribution / Scale-free Structuredegree distribution P(k): the fraction of vertices in network withdegree kscale-free network : a network, where it holds that P(k) ∼ k−γ ,where γ is a coefficient of steepness of the distributionhigher γ means that the network is sparser; for many real-worldnetworks: γ ∈ [2, 3]
10-4
10-3
10-2
10-1
100
100 101 102 103
BA2_10000 (degree distribution)
10-5
10-4
10-3
10-2
10-1
100
100 101 102 103 104
as-22july06 (degree distribution)
Figure : Degree distributions for a artificial scale-free network (on the left)and a snapshot of the Internet (on the right).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 9 / 48
Overview
the vertex clique covering problem (CCP)
some properties of real-world networks
motivation and relations to different fields
the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set
experimental results
(a sketch of a few) theoretical results
conclusions and discussion
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 10 / 48
Motivation and Applications
data mining (Sun et al., 2008) and web mining (Tang et al., 2011)
research citation network analysis (Sun et al., 2008)
protein interaction and gene regulatory networks in bioinformatics(Gao et al., 2009); (Boyer et al., 2005)
analysis of terrorist organization networks (Patillo et al., 2012)
infectious diseases epidemiology (Rothenberg et al., 1996)
scheduling and timetabling (Burke et al., 2007)
frequency assignment in mobile radio networks (Smith et al., 1998)
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 11 / 48
Relation to Different Fields
Artificial Intelligence:designing / choosing an efficient heuristic
Graph Mining:knowledge discovery from raw network data
Theoretical Computer Science:understanding how well (badly) the algorithm performs (and why)
Statistical Mechanics:understanding the relation to how the network was created
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 12 / 48
Currently Available Algorithms: Graph Coloring Algorithms
Brelaz’s graph coloring heuristic (Brelaz, 1979) - generally a goodtradeoff quality ↔ speedO(n2) time; O(n2) space complexity
Leighton’s graph coloring heuristic (Leighton, 1979) - more suitablefor some graph classesO(n3) time; O(n2) space complexity
Culberson and Luo’s iterated greedy heuristic (Culberson and Luo,1996) - repeated construction of solutions by a greedy algorithmcombined with a stochastic improvement mechanismO((n2 −m)) time per iteration; O(n2) space complexity1
1The time and space complexities of all these algorithms already take into accountthat CCP is equivalent to graph coloring of complementary graph (i.e. CCP for a sparsegraph leads to coloring of a dense graph).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 13 / 48
Overview
the vertex clique covering problem (CCP)
some properties of real-world networks
motivation and relations to different fields
the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set
experimental results
(a sketch of a few) theoretical results
conclusions and discussion
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 14 / 48
Greedy Clique Covering (GCC)
not just a greedy algorithm - also a “genotype-phenotype mapping”
permutation of vertices → clique covering
let Γ(v , c) be the number of neighbors of v with label c
if Γ(v , c) = |Vc |, i.e. all vertices in clique G (Vc ) are neighbors of v ,then we can put v into this clique
if there are more suitable cliques, we choose the one with minimum c- this is called the First Fit strategy (Welsh and Powell, 1967)
O(m) time; O(n) space complexity
GFigure : Illustration graph for greedy clique covering (GCC).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 15 / 48
Block-based Mutation Operators and Iterated Greedy
block-based property: If we put k cliques as blocks in permutationand run GCC once again, then we obtain at most k cliques(Culberson and Luo, 1996); (Chalupa, 2012)
block-based mutation: We shuffle the blocks and re-run GCC.Such an algorithm behaves like typical local search
every shuffling operation is equivalent to a sequence of consecutiveblock jump operations
motivation: Great experimental results on real-world networks(especially social networks)
Figure : Illustration of the block jump(j , 1,P) operator.
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 16 / 48
Iterated Greedy
Algorithm 1: The IG Algorithm for CCPThe IG Algorithm for CCP
Input: graph G = [V ,E ]Output: clique covering S of G
1 P = random permutation(1, 2, ..., |V |)2 while stopping criterion is not met3 [V1,V2, ...,Vk ] = greedy clique covering(G ,P)4 if ϑ∗(G ) is known and k = ϑ∗(G )5 return S = V1,V2, ...,Vk6 P = [V1,V2, ...,Vk ]7 P = random permutation(V1,V2, ...,Vk )8 return S = V1,V2, ...,Vk
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 17 / 48
Iterated Greedy: An Interlude with Experimental Results
Table : The comparison of the approximations of ϑ(G ) = χ(G ) for each graphobtained by the Brelaz’s heuristic (BRE), saturation-based GCC (SAT-GCC) andthe IG heuristic with GCC (IG-GCC).
G BRE SAT-GCC IG-GCC
Erdos-Renyi uniform random graphsunif 1000 0.1 299 310 243unif 5000 0.1 1241 1288 1066unif 10000 0.1 2326 2389 2025unif 20000 0.01 7640 7817 6387
Leighton graphs from DIMACS instances.le450 15a 85 89 80le450 15b 92 90 82le450 15c 68 74 57le450 15d 73 73 57le450 25a 91 92 91le450 25b 81 82 80le450 25c 61 59 54le450 25d 60 59 51
Social graphssoc2000 1471 1473 1471soc10000 6619 6633 6618soc20000 12770 12804 12764
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 18 / 48
Lower Bounds for Clique Covering Number
Lemma. Let G be an undirected graph with minimum degree δmin(G ),clique covering number ϑ(G ), maximum independent set size α(G ) andmaximum clique size ω(G ). Then, ϑ(G ) is bounded in the following way:
max
α(G ),
|V |ω(G )
≤ ϑ(G ) ≤ |V | − δmin(G ). (1)
More generally:
αL(G ) ≤ ϑ(G ) ≤ ϑU(G ). (2)
In practice αL(G ) will be a better upper bound. In the context of socialnetworks, it is the size of some large groups of people, where nobodyknows nobody.
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 19 / 48
Randomized Local Search (RLS) for MaximumIndependent Set
permutation of vertices → independent set (IS)
we begin with independent set S = ∅in each iteration, we take the next vertex from the permutation...
... and add it to S if it can be added without violating the IS property
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 20 / 48
Randomized Local Search (RLS) for MaximumIndependent Set
Algorithm 2: RLS1p Algorithm for the Maximum Independent Set
SizeRLS1
p Algorithm for the Maximum Independent Set Size
Input: graph G = [V ,E ]Output: the size α(G ) of the maximum independent set
1 P = random permutation(1, 2, ..., |V |), P∗ = P, k∗ = 12 while stopping criterion is not met3 k = |greedy independent set(G ,P)|4 if k ≥ k∗
5 k∗ = k , P∗ = P6 j = uniformly random(2, |V |)7 P = jump(j , 1,P∗)8 return α(G ) = k∗
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 21 / 48
Overview
the vertex clique covering problem (CCP)
some properties of real-world networks
motivation and relations to different fields
the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set
experimental results
(a sketch of a few) theoretical results
conclusions and discussion
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 22 / 48
Iterated Greedy Clique Covering: Instances for Experiments
web-based social network extracts
network science instances: adjective-noun network (Newman, 2002),collaboration network (Newman, 2002), social network (Zachary,1977), college football network (Girvan and Newman, 2002),computer network
coappearance networks: network of coapperances of literarycharacters (Knuth, 1993)
Leighton graphs: quasirandom graphs modeling large schedulingproblems (Leighton, 1979)
Erdos-Renyi uniform random graphs
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 23 / 48
Iterated Greedy Clique Covering: Results on ComplexNetworks
Table : Detailed computational results of our approach on complex networkinstances.
source of G file name ϑ∗ succ. iter. CPUWeb-based social network extracts
Social network I. soc500 ϑ ≤ 377 30/30 1888 < 1 s|V | = 500, |E | = 924 ϑ ≥ 377 30/30 3764 < 1 sSocial network I. soc1000 ϑ ≤ 759 30/30 3801 1 s|V | = 1000, |E | = 1876 ϑ ≥ 759 30/30 7960 < 1 sSocial network I. soc2000 ϑ ≤ 1471 30/30 7372 4 s|V | = 2000, |E | = 4124 ϑ ≥ 1470 30/30 17430 < 1 sSocial network I. soc10000 ϑ ≤ 6618 30/30 33276 89 s|V | = 10000, |E | = 28675 ϑ ≥ 6618 17/30 124120 31 sSocial network I. soc20000 ϑ ≤ 12764 30/30 64651 366 s|V | = 20000, |E | = 63245 ϑ ≥ 12764 25/30 274529 147 sSocial network II. soc52 ϑ ≤ 15 30/30 78 < 1 s|V | = 52, |E | = 822 ϑ ≥ 15 30/30 508 < 1 s
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 24 / 48
Iterated Greedy Clique Covering: Results on ComplexNetworks
Table : Detailed computational results of our approach on complex networkinstances.
source of G file name ϑ∗ succ. iter. CPUNetwork science instances
Adjective-noun adjacencies adjnoun ϑ ≤ 55 30/30 364 < 1 s|V | = 112, |E | = 425 ϑ ≥ 53 30/30 1145 < 1 sNetwork science collaborations netscience ϑ ≤ 630 30/30 3453 1 s|V | = 1589, |E | = 2742 ϑ ≥ 630 30/30 11874 < 1 sLes Miserables network lesmis ϑ ≤ 35 30/30 176 < 1 s|V | = 77, |E | = 254 ϑ ≥ 35 30/30 546 < 1 sZachary Karate Club zachary ϑ ≤ 20 30/30 101 < 1 s|V | = 34, |E | = 78 ϑ ≥ 20 30/30 232 < 1 sAmerican College Football football ϑ ≤ 22 22/30 118 < 1 s|V | = 115, |E | = 616 ϑ ≥ 21 30/30 1215 < 1 sSnapshot of the Internet as − 22july06 ϑ ≤ 19661 30/30 98312 556 s|V | = 22963, |E | = 48436 ϑ ≥ 19660 26/30 192136 128 s
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 25 / 48
Iterated Greedy Clique Covering: Results on ComplexNetworks
Table : Detailed computational results of our approach on complex networkinstances.
source of G file name ϑ∗ succ. iter. CPUCharacters’ coappearance networks (Johnson and Trick, 1996)
Anna Karenina anna ϑ ≤ 80 30/30 402 < 1 s|V | = 138, |E | = 986 ϑ ≥ 80 30/30 1022 < 1 sDavid Copperfield david ϑ ≥ 36 30/30 182 < 1 s|V | = 87, |E | = 812 ϑ ≤ 36 30/30 715 < 1 sHuckleberry Finn huck ϑ ≤ 27 30/30 136 < 1 s|V | = 74, |E | = 602 ϑ ≥ 27 30/30 516 < 1 sIliad and Odyssey homer ϑ ≤ 341 30/30 1711 < 1 s|V | = 561, |E | = 3258 ϑ ≥ 341 30/30 4219 < 1 sJean Valjean jean ϑ ≤ 38 30/30 192 < 1 s|V | = 80, |E | = 508 ϑ ≥ 38 30/30 574 < 1 s
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 26 / 48
Iterated Greedy Clique Covering: Overview of the Results
Table : Summary of the upper and lower bounds for ϑ obtained by IG for cliquecovering and RLS for maximum independent sets on complex network instances.
source of G file name ϑL(G) ϑU (G)|V |
ϑU (G)
Web-based social network extractsSocial network I. soc500 377 377 1.33Social network I. soc1000 759 759 1.32Social network I. soc2000 1470 1471 1.36Social network I. soc10000 6618 6618 1.51Social network I. soc20000 12764 12764 1.57Social network II. soc52 15 15 3.47
Network science instancesAdjective-noun adjacencies adjnoun 53 55 2.04Network science collaborations netscience 690 690 2.30Les Miserables network lesmis 35 35 2.20Zachary Karate Club zachary 20 20 1.70American College Football football 21 22 5.23Snapshot of the Internet as − 22july06 19660 19661 1.17
Characters’ coappearance networks (Johnson and Trick, 1996)Anna Karenina anna 80 80 1.73David Copperfield david 36 36 2.42Huckleberry Finn huck 27 27 2.74Iliad and Odyssey homer 341 341 1.65Jean Valjean jean 38 38 2.11
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 27 / 48
Iterated Greedy Clique Covering: Results on ArtificialGraphs
Table : Summary of the upper and lower bounds for ϑ obtained by our approachon synthetic graphs following the Leighton’s model.
source of G file name ϑL(G) ϑU (G) |V |ϑU (G)
Leighton graphs from DIMACS coloring instances (Johnson and Trick, 1996)Leighton graph (15-colorable) le450 15a 75 80 5.63Leighton graph (15-colorable) le450 15b 78 82 5.49Leighton graph (15-colorable) le450 15c 41 57 7.76Leighton graph (15-colorable) le450 15d 41 57 7.76Leighton graph (25-colorable) le450 25a 91 91 4.95Leighton graph (25-colorable) le450 25b 78 80 5.63Leighton graph (25-colorable) le450 25c 47 54 8.33Leighton graph (25-colorable) le450 25d 43 51 8.82
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 28 / 48
Iterated Greedy Clique Covering: Results on ArtificialGraphs
Table : Summary of the upper and lower bounds for ϑ obtained by our approachon synthetic graphs following the Erdos-Renyi model.
source of G file name ϑL(G) ϑU (G) |V |ϑU (G)
Erdos-Renyi uniform random graphsUniform random graph unif 1000 0.1 147 243 4.12Uniform random graph unif 5000 0.1 617 1066 4.69Uniform random graph unif 10000 0.1 1154 2025 4.94Uniform random graph unif 20000 0.01 3796 6387 3.13
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 29 / 48
Instances: Degree and Clique Size Distributions
10-4
10-3
10-2
10-1
100
100 101 102
soc2000 (degree distribution)
10-5
10-4
10-3
10-2
10-1
100
100 101 102
soc20000 (degree distribution)
10-4
10-3
10-2
10-1
100
100 101 102
netscience (degree distribution)
10-3
10-2
10-1
100
100 101
soc2000 (clique size distribution)
10-4
10-3
10-2
10-1
100
100 101
soc20000 (clique size distribution)
10-3
10-2
10-1
100
100 101 102
netscience (clique size distribution)
Figure : The visualization of degree and clique size distributions for chosenreal-world network test instances and the obtained solutions in log log scale.
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 30 / 48
Instances: Degree and Clique Size Distributions
10-3
10-2
10-1
100
100 101 102
football (degree distribution)
10-5
10-4
10-3
10-2
10-1
100
100 101 102 103 104
as-22july06 (degree distribution)
10-3
10-2
10-1
100
100 101 102
homer (degree distribution)
10-2
10-1
100
100 101
football (clique size distribution)
10-5
10-4
10-3
10-2
10-1
100
100 101
as-22july06 (clique size distribution)
10-3
10-2
10-1
100
100 101
homer (clique size distribution)
Figure : The visualization of degree and clique size distributions for chosenreal-world network test instances and the obtained solutions in log log scale (partII).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 31 / 48
Instances: Degree and Clique Size Distributions
10-5
10-4
10-3
10-2
10-1
100 101 102 103
unif20000_0.01 (degree distribution)
10-3
10-2
10-1
100 101 102 103
le450_15c (degree distribution)
10-3
10-2
10-1
100 101 102 103
le450_25b (degree distribution)
10-2
10-1
100
100 101
unif20000_0.01 (clique size distribution)
10-2
10-1
100
100 101 102
le450_15c (clique size distribution)
10-2
10-1
100
100 101 102
le450_25b (clique size distribution)
Figure : The visualization of degree and clique size distributions for chosensynthetic test instances in log log scale.
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 32 / 48
Interesting Points
It seems to be easy to find a good clique covering of real-world socialnetwork, despite the fact that CCP is NP-hard.
This seems to be due to structural / statistical properties of thenetworks.
When the approach gives only an interval [ϑL, ϑU ]? How to cope withthis?
Scaling to larger networks (105 - 106 vertices)?
Generalization to communities in general?Is it easy or hard to find ,,high-level” communities in real-worldgraphs?
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 33 / 48
Overview
the vertex clique covering problem (CCP)
some properties of real-world networks
motivation and relations to different fields
the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set
experimental results
(a sketch of a few) theoretical results
conclusions and discussion
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 34 / 48
How Iterated Greedy Really Works
IG is a local search algorithm: it works like walking down some stairs- fitness levels
plateaus: on each stair, the algorithm moves randomly until it findsthe edge - random walk
local optima: no guarantee that there is a way to a lower step fromthe current one
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 35 / 48
Analytical Results on GCC and Iterated Greedy
Table : An overview of analytical results on GCC and iterated greedy.
graph class GCC IGpaths approximation optimal
ratio 4/3 O(n5)trees approximation optimal
ratio ∈ [4/3, 2] O(n5)growing constant constantnetworks approx. ratio approx. ratiocomplements of differs based can get stuckbipartite graphs on density with small prob.worst-case unknown can get stuckresult (probably very bad) with prob. Ω(1)
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 36 / 48
Behavior of GCC and Block-based Mutation on Paths
Theorem. For IG with block jump operator on paths, the expected time toobtain the optimal clique covering is upper bounded by O(n5).Sketch of proof.- We have at most O(n) extra cliques - these represent the fitness levels.- On each of them, there is a random walk of 1-cliques, until they meetand form a 2-clique.- This random walk is almost fair - it takes O(n3) time to upgrade to abetter fitness level.- The complexity of GCC is O(n).
Figure : Illustration of the case, when we have only two vertices to be joined, i.e.we have a solution with ϑ+ 1 cliques.
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 37 / 48
Sparse Complements of Bipartite Graphs (Biclique Graphs)
Theorem.Let G = [V ,E ] be a graph with with 2 planted cliques of size r . Let thenumber of edges between the planted cliques Eout satisfy |E |out < r .Then, IG with GCC and random reorderings will converge to the optimalsolution in O(n3) time.Sketch of proof. By induction from the simple case of two triangles.
Figure : Two triangles with 1 inter-clique edge and their coverings with 2 and 3cliques (on the left and in the middle) and two triangles with 2 inter-clique edgesand their covering with 4 cliques (on the right).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 38 / 48
Complements of Bipartite Graphs
Lemma. On graph H, a uniformly random permutation will lead to aclique covering that is not improvable by block jump with probability atleast 1/15.Proof. The labeling should induce the three inter-clique edges, instead ofthe two triangles. In permutations with “embedded” inter-clique edges,there are 3 blocks (the wrong cliques) and 23 possible internal orderings ofthe vertices in these blocks. Thus, the probability of generating such asituation is at least 3! 23
6! = 115 .
Figure : The illustration of graph H, on which the IG can fail to converge withprobability at least 1/15.
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 39 / 48
Worst-case Result
Theorem. On graphs from class Hϑ/2, a uniformly random permutationwill lead to a clique covering that is not improvable by block jump with
probability at least 1−(
1415
)|V |/6.
Proof. Since the H subgraphs are disjoint, we can treat their vertices inthe permutation as independent. The independence of the componentsimplies that the probability that all subpermutations are the right ones, is
at most(
1415
)|V |/6. Thus, the inverse probability is 1−
(1415
)|V |/6.
...
Figure : The illustration of graph Hϑ/2, which consists of ϑ/2 disjoint Hsubgraphs.
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 40 / 48
Overview
the vertex clique covering problem (CCP)
some properties of real-world networks
motivation and relations to different fields
the approach: iterated greedy (IG) clique covering;randomized local search (RLS) for maximum independent set
experimental results
(a sketch of a few) theoretical results
conclusions and discussion
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 41 / 48
Conclusions
iterated greedy (IG) clique covering :a constructive heuristic for CCP with stochastic improvementmechanism
randomized local search for maximum independent set:serves as a lower bound (since we do not know the optimum forreal-world graphs)
results:on 13 out of 17 real-world graphs our approach solved the problemoptimally; on the rest of the graphs, a near-optimal solution wasfound
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 42 / 48
Open Problems
theoretical analysis of IG on models of complex networks
analytical results on RLS for maximum independent set in general
study of the impact of further scaling - what if we try to solve theproblem for 105, 106 vertex networks
generalization to more “loose” community detection problems
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 43 / 48
Thank you for your [email protected]
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 44 / 48
References I
(Chalupa, 2013a) Chalupa, D.: Construction of Near-optimal Vertex Clique Covering forReal-world Networks. In: Computing and Informatics, to appear.
(Chalupa, 2013b) Chalupa, D.: An Analytical Investigation of Block-based MutationOperators for Order-based Stochastic Clique Covering Algorithms. In: Blum, C., Alba , E.(eds.) Proceedings of the 15th annual conference on Genetic and evolutionarycomputation. pp. 495–502. GECCO ’13, ACM, New York, NY, USA (2013).
(Chalupa, 2012) Chalupa, D.: On the efficiency of an order-based representation in theclique covering problem. In: Soule, T., Moore, J. (eds.) Proceedings of the 14th annualconference on Genetic and evolutionary computation. pp. 353–360. GECCO ’12, ACM,New York, NY, USA (2012).
(Chalupa, 2011) Chalupa, D.: On the Ability of Graph Coloring Heuristics to FindSubstructures in Social Networks. In: Information Sciences and Technologies Bulletin ofACM Slovakia, 3(2):51-54 (2011).
(Leskovec, 2010) Leskovec, J.—Lang, K. J.—Mahoney, M. W.: Empirical comparison ofalgorithms for network community detection. In M. Rappa, P. Jones, J. Freire and S.Chakrabarti (Eds.): Proceedings of the 19th International Conference on World Wide Web,WWW 2010, pp. 631–640. ACM, New York, NY, USA (2010).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 45 / 48
References II
(Girvan and Newman, 2002) Girvan, M.—Newman, M. E. J.: Community structure insocial and biological networks. Proceedings of the National Academy of Sciences,99(12):7821-7826 (2002).
(Newman, 2006) Newman, M. E. J.: Finding community structure in networks using theeigenvectors of matrices. arXiv:physics/0605087 (2006).
(Zachary, 1977) Zachary, W. W.: An information flow model for conflict and fission insmall groups. Journal of Anthropological Research, 33:452-473 (1977).
(Knuth, 1993) Knuth, D. E.: The Stanford GraphBase: A Platform for CombinatorialComputing. Addison-Wesley, Reading, MA, 1993.
(Karp, 1972) Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.,Thatcher, J. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press,New York, NY, USA (1972)
(Johnson and Trick, 1996) Johnson, D. S.—Trick, M.: Cliques, Coloring, and Satisfiability:Second DIMACS Implementation Challenge, DIMACS Series in Discrete Mathematics andTheoretical Computer Science, Vol. 26. American Mathematical Society (1996).
(Schaeffer, 2007) Schaeffer, S. E.: Graph clustering. Computer Science Review, 1(1):27-64(2007).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 46 / 48
References III
(Brelaz, 1979) Brelaz, D.: New methods to color vertices of a graph. Communications ofthe ACM 22(4):251–256 (1979).
(Culberson and Luo, 1996) Culberson, J.C., Luo, F.: Exploring the k-colorable landscapewith iterated greedy. In: Johnson, D.S., Trick, M. (eds.) Cliques, Coloring, andSatisfiability: Second DIMACS Implementation Challenge. pp. 245–284. AmericanMathematical Society (1996).
(Gendreau et al., 1993) Gendreau, M., Soriano, P., Salvail, L.: Solving the maximumclique problem using a tabu search approach. Ann Oper Res 41, 385–403 (1993)
(Welsh and Powell, 1967) Welsh, D.J.A., Powell, M.B.: An upper bound for the chromaticnumber of a graph and its application to timetabling problems. The Computer Journal10(1), 85–86 (1967)
(Boyer et al., 2005) F. Boyer, A. Morgat, L. Labarre, J. Pothier, and A. Viari. Syntons,metabolons and interactons: an exact graph-theoretical approach for exploringneighbourhood between genomic and functional data. Bioinformatics, 21(23):4209–4215(2005).
(Burke et al., 2007) E. K. Burke, B. McCollum, A. Meisels, S. Petrovic, and R. Qu. Agraph-based hyper-heuristic for educational timetabling problems. European Journal ofOperational Research, 176(1):177–192 (2007).
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 47 / 48
References IV
(Gao et al, 2009) L. Gao, P. Sun, and J. Song. Clustering algorithms for detectingfunctional modules in protein interaction networks. Journal of Bioinformatics andComputational Biology, 7(1):217–242 (2009).
(Leighton, 1979) F. T. Leighton. A graph coloring algorithm for large scheduling problems.Journal of Research of the National Bureau of Standards, 84(6):489–503 (1979).
(Smith et al, 1998) D. H. Smith, S. Hurley, and S. U. Thiel. Improving heuristics for thefrequency assignment problem. European Journal of Operational Research, 107(1):76–86(1998).
(Sun et al., 2008) J. Sun, Y. Xie, H. Zhang, and C. Faloutsos. Less is more: Sparse graphmining with compact matrix decomposition. Statistical Analysis and Data Mining,1(1):6–22, 2008.
(Tang et al., 2011) J. Tang, T. Wang, J. Wang, Q. Lu, and W. Li. Using complex networkfeatures for fast clustering in the web. In S. Sadagopan, K. Ramamritham, A. Kumar,M. P. Ravindra, E. Bertino, and R. Kumar, editors, Proceedings of the 20th internationalconference companion on World wide web, WWW ’11, pages 133–134, New York, NY,USA, 2011. ACM.
(Rothenberg et al., 1996) R. B. Rothenberg, J. J. Potterat, and D. E. Woodhouse.Personal Risk Taking and the Spread of Disease: Beyond Core Groups. The Journal ofInfectious Diseases, Supplement 2, 174:S144–S149, 1996.
David Chalupa (FIIT SUT) Near-optimal Vertex Clique Covering October 14, 2013 48 / 48