graph modeled data clustering: fixed parameter algorithms for clique generation

25
Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation J. Gramm, J. Guo, F. Hüffner and R. Niedermeier Theory of Computing Systems (2005) Student: Vishal Kapoor

Upload: jui

Post on 09-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation. J. Gramm, J. Guo, F. Hüffner and R. Niedermeier Theory of Computing Systems (2005) Student: Vishal Kapoor. Presentation Outline. Problem Introduction Past Research Results of the paper CLUSTER EDITING - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Graph Modeled Data Clustering: Fixed Parameter Algorithms for

Clique Generation

J. Gramm, J. Guo, F. Hüffner and R. Niedermeier Theory of Computing Systems (2005)

Student: Vishal Kapoor

Page 2: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Presentation Outline

• Problem Introduction

• Past Research

• Results of the paper

• CLUSTER EDITING– Kernelization– Search Tree

• CLUSTER DELETION

• Questions

Page 3: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Problem Statement

• Make k changes to the edge set of an input graph to get vertex disjoint cliques.

• Each connected component is a clique in the resulting cluster graph

• CLUSTER EDITING– Both edge additions and deletions are allowed

• CLUSTER DELETION– Only edge deletions are allowed

• Used in clustering of data – vertices are adjacent iff their similarity exceeds a threshold

Page 4: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Past Research

• [2000] Study of both these problems started by Shamir et. al. who proved that they are NPC and APX-hard

• [1996] Cai studied the problem of edge additions and deletions and vertex deletions for certain graphs and showed it is FPT

• [2001] Natanzon et. al. gave a general c-approximation for deletion and editing problems on bounded degree graphs for graphs with certain properties

• [2002] Khot and Raman investigated the complexity of vertex deletion problems to find subgraphs with hereditary properties

Page 5: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Results of this paper

• CLUSTER EDITING – O(2.27k+|V|3)

• CLUSTER DELETION – O(1.77k+|V|3)

• By using certain reduction rules, the resulting kernel size = O(k3)– Has at most 2k2+ 2 vertices and 2k3+k2 edges.

Page 6: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

u v

common neighbor

non-common neighbor

CLUSTER EDITING

Page 7: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Reduction Rules

• Rule1:a. If u and v have more than k common

neighbors then {u,v} is set to ADDED and added to E if not already there

b. If u and v have more than k non-common neighbors then {u,v} is set to DELETED and deleted from E if already there

c. If u and v have both more than k common neighbors and more than k non-common neighbors then the instance has no solution

Page 8: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Reduction Rules

• Rule2:

• For every 3 vertices u, v and w:a. If {u,v} = ADDED and {u,w} = ADDED then

{v,w} should be set to ADDED and added if not already in E

b. If {u,v} = ADDED and {u,w} = DELETED then {v,w} should be set to DELETED and deleted from E if already present

Page 9: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Running Time

• What is checked?

– Every pair of vertices • Every vertex which is a neighbor of both of

them

• Takes time O(|V|3)

Page 10: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Kernel Size

• The kernel contains at most (2k+1).k vertices and at most (2k+1 choose 2).k edges.

• Proof Skipped

Page 11: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Branch and Search Algorithm

• Identify a bad triple (of 3 vertices) in the kernel and repair it by adding/deleting edges to/from it, to transform the graph into disjoint cliques

• Overall at most k edge additions/deletions are allowed

• 2 branching strategies:– Basic = O(3k)– Advanced = O(2.27k)

Page 12: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

• Lemma: A graph consists of disjoint cliques iff there are no three vertices u,v,w such that {u,v}, {u,w} are edges, but {v,w} is not an edge

• i.e. among such a triple, there should either be a single edge or a triangle

• Thus if a graph is not a union of disjoint cliques, then a bad triple can be found and repaired

Basic Branching

v w

u

Page 13: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Basic Branch Algorithm

1. If G is a union of disjoint cliques, return SUCCESS

2. If k <= 0, return FAIL

3. Otherwise, find 3 vertices u,v,w such that edges {u,v}, {u,w} exist and {v,w} does not and branch on 3 instances of G’ as follows:

a. E’ = E – {u,v}, k’=k-1 and set {u,v}=DELETED

b. E’ = E – {u,w}, k’=k-1 and set {u,w} and {v,w}=DELETED, {u,v}=ADDED

c. E’ = E + {v,w}, k’=k-1 and set all edges=ADDED

Page 14: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Branching Rules

v w

u

v w

u

v w

u

v w

u

??

BR3

BR2

BR1

Page 15: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Running timeThe algorithm solves CLUSTER EDITING in

time = O(3k.k2+|V|3)

1. O(|V|3) is the time required to find all bad triples

2. O(3k) is the size of the search tree3. The kernel (modified input G’) has |V| = O(k2)

vertices. So a newly added/deleted edge can create/delete at most O(k2) bad triples. [And the edge list can then be updated only for vertices affected by that edge in O(k2) time.]

Page 16: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Eg.

NOTE: The time can be improved to O(3k+|V|3) by using repeated kernelization at every search tree node whenever possible for a polynomial size problem kernel

• Similarly CLUSTER-DELETION can be solved in time = O(2k+|V|3)

Page 17: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Advanced Branch Algorithm

1. Bad triples are considered, but their classification is refined further as follows:

vw

u vw

u

vw

u

C1

C2

C3

Page 18: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Branching for each case

• For C1: BR3 cannot give a solution better than both BR1 and BR2 and can be omitted

• If N(v) >= N(w), then total edges changed to make 1 clique >= total edges changed to make 2 cliques

u2

v2w2 v1

w1

u1v

w

u

C1

Page 19: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

• Edges added to make 1 clique =– {v,w} added = +1– {v,N(w)} added – {u,v} existing = N(v) – 1– {w,N(v)} added – {u,w} existing = N(w) – 1– joining all N(w) and N(v) = ([N(w)+N(v)] choose 2)– joining each N(v) and N(w) with u = N(v)+N(w)– Total = 2.[N(v) + N(w)] + ([N(w)+N(v)] choose 2) – 1 =>(A)

• Edges changed to make 2 cliques =– N(w) deleted = N(w)– {v,N(w)} added – {u,v} existing = N(v) – 1– joining all N(w) and N(v) = ([N(w)+N(v)] choose 2)– joining each N(v) and N(w) with u = N(v)+N(w)– Total = N(v) + 3.N(w) + ([N(w)+N(v)] choose 2) – 1 =>(B)

• Conclusion: As N(v) >= N(w) So (A) >= (B).

u2

v2w2 v1

w1

u1v

w

u

C1

Page 20: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

• Thus only BR1 and BR2 can be used:

• So resulting graphs = G\{u,v} or G\{u,w} and branching vector = (1,1)

• And final recurrence relation: T(k) = 2.T(k-1) with root = 2.

• So final tree size for C1 = 2k.

v w

u

v w

u

??

BR2BR1

Page 21: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

• For C2:

• Branching Vector = (1,2,3,2,3)

Page 22: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

• For C3:

• Branching Vector = (1,2,3,2,3)

Page 23: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Overall Running Time

• Solve T(k) = T(k-1) + 2 [T(k-2) + T(k-3)]

• So final worst search tree size = O(2.27k)

• Thus CLUSTER-EDITING can be solved in O(2.27k+|V|3)

Page 24: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

• Cases for CLUSTER-DELETION:

• Branching Vector = (2,3,2,3) and running time = O(1.77k + |V|3)

Page 25: Graph Modeled Data Clustering: Fixed Parameter Algorithms for Clique Generation

Questions?

Thanks.