near-optimal lp rounding for correlation clustering

27
Near-Optimal LP Rounding for Correlation Clustering Grigory Yaroslavtsev http://grigory.us With Shuchi Chawla (University of Wisconsin, Madison), Konstantin Makarychev (Microsoft Research), Tselil Schramm (University of California, Berkeley)

Upload: others

Post on 03-Nov-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Near-Optimal LP Rounding for Correlation Clustering

Near-Optimal LP Rounding for Correlation Clustering

Grigory Yaroslavtsev

http://grigory.us

With Shuchi Chawla (University of Wisconsin, Madison), Konstantin Makarychev (Microsoft Research), Tselil Schramm (University of California, Berkeley)

Page 2: Near-Optimal LP Rounding for Correlation Clustering

Correlation Clustering

β€’ Inspired by machine learning at

β€’ Practice: [Cohen, McCallum β€˜01, Cohen, Richman ’02]

β€’ Theory: [Blum, Bansal, Chawla ’04]

Page 3: Near-Optimal LP Rounding for Correlation Clustering

Correlation Clustering: Example

β€’ Minimize # of incorrectly classified pairs:

# Covered non-edges + # Non-covered edges

β€’ Min-CSP, but # labels is unbounded

4 incorrectly classified = 1 covered non-edge + 3 non-covered edges

Page 4: Near-Optimal LP Rounding for Correlation Clustering

Approximating Correlation Clustering

β€’ Minimize # of incorrectly classified pairs

– β‰ˆ 20000-approximation [Blum, Bansal, Chawla’04]

– [Demaine, Emmanuel, Fiat, Immorlica’04],[Charikar, Guruswami, Wirth’05], [Williamson, van Zuylen’07], [Ailon, Liberty’08],…

– 2.5 [Ailon, Charikar, Newman’05]

– APX-hard [Charikar, Guruswami, Wirth’05]

β€’ Maximize # of correctly classified pairs

– (1 βˆ’ πœ–)-approximation [Blum, Bansal, Chawla’04]

Page 5: Near-Optimal LP Rounding for Correlation Clustering

Correlation Clustering

One of the most successful clustering methods:

β€’ Only uses qualitative information about similarities

β€’ # of clusters unspecified (selected to best fit data)

β€’ Applications: document/image deduplication (data from crowds or black-box machine learning)

β€’ NP-hard [Bansal, Blum, Chawla β€˜04], admits simple approximation algorithms with good provable guarantees

β€’ Agnostic learning problem

Page 6: Near-Optimal LP Rounding for Correlation Clustering

Correlation Clustering

More:

β€’ Survey [Wirth]

β€’ KDD’14 tutorial: β€œCorrelation Clustering: From Theory to Practice” [Bonchi, Garcia-Soriano, Liberty] http://francescobonchi.com/CCtuto_kdd14.pdf

β€’ Wikipedia article: http://en.wikipedia.org/wiki/Correlation_clustering

Page 7: Near-Optimal LP Rounding for Correlation Clustering

Data-Based Randomized Pivoting

3-approximation (expected) [Ailon, Charikar, Newman]

Algorithm:

β€’ Pick a random pivot vertex 𝒗

β€’ Make a cluster 𝒗 βˆͺ 𝑁(𝒗), where 𝑁 𝒗 is the set of neighbors of 𝒗

β€’ Remove the cluster from the graph and repeat

Modification: (3 + πœ–)-approx. in 𝑂(log2 𝑛 / πœ–) rounds of MapReduce [Chierichetti, Dalvi, Kumar, KDD’14] http://grigory.us/blog/mapreduce-clustering

Page 8: Near-Optimal LP Rounding for Correlation Clustering

Data-Based Randomized Pivoting

β€’ Pick a random pivot vertex 𝒑

β€’ Make a cluster 𝒑 βˆͺ 𝑁(𝒑), where 𝑁 𝒑 is the set of neighbors of 𝒑

β€’ Remove the cluster from the graph and repeat

8 incorrectly classified = 2 covered non-edges + 6 non-covered edges

Page 9: Near-Optimal LP Rounding for Correlation Clustering

Integer Program

Minimize: π‘₯𝑒𝑣𝑒,𝑣 ∈𝐸 + (1 βˆ’ π‘₯𝑒𝑣)𝑒,𝑣 βˆ‰πΈ

π‘₯𝑒𝑣 ≀ π‘₯𝑒𝑀 + π‘₯𝑀𝑣 βˆ€π‘’, 𝑣, 𝑀 π‘₯𝑒𝑣 ∈ *0,1+

β€’ Binary distance: β€’ π‘₯𝑒𝑣 = 0 𝑒 and 𝑣 in the same cluster β€’ π‘₯𝑒𝑣 = 1 𝑒 and 𝑣 in different clusters

β€’ Objective is exactly MinDisagree β€’ Triangle inequalities give transitivity:

β€’ π‘₯𝑒𝑀 = 0, π‘₯𝑀𝑣 = 0 β‡’ π‘₯𝑒𝑣 = 0 β€’ 𝑒 ∼ 𝑣 iff π‘₯𝑒𝑣 = 0 is an equivalence relation,

equivalence classes form a partition

Page 10: Near-Optimal LP Rounding for Correlation Clustering

Linear Program

β€’ Embed vertices into a (pseudo)metric:

β€’ Integrality gap β‰₯ 2 – o(1)

Minimize: π‘₯𝑒𝑣𝑒,𝑣 ∈𝐸 + (1 βˆ’ π‘₯𝑒𝑣)𝑒,𝑣 βˆ‰πΈ

π‘₯𝑒𝑣 ≀ π‘₯𝑒𝑀 + π‘₯𝑀𝑣 βˆ€π‘’, 𝑣, 𝑀 π‘₯𝑒𝑣 ∈ ,0,1-

Page 11: Near-Optimal LP Rounding for Correlation Clustering

Integrality Gap

β€’ IP cost = n – 2

β€’ LP solution π‘₯𝑒𝑣:

β€“πŸ

𝟐 for edges (𝑒, 𝑣𝑖)

– 1 for non-edges 𝑣𝑖 , 𝑣𝑗 – LP cost = Β½ (n - 1)

β€’ IP / LP = 2 – o(1)

Minimize: π‘₯𝑒𝑣𝑒,𝑣 ∈𝐸 + (1 βˆ’ π‘₯𝑒𝑣)𝑒,𝑣 βˆ‰πΈ

π‘₯𝑒𝑣 ≀ π‘₯𝑒𝑀 + π‘₯𝑀𝑣 βˆ€π‘’, 𝑣, 𝑀 π‘₯𝑒𝑣 ∈ ,0,1-

𝒖

π’—πŸ π’—π’βˆ’πŸ π’—πŸ π’—π’Š

𝟏

𝟐

𝟏

𝟐

𝟏

𝟐

𝟏

𝟐

𝟏

𝟐 …

𝟏 𝟏 𝟏 𝟏 𝟏 …

Page 12: Near-Optimal LP Rounding for Correlation Clustering

Can the LP be rounded optimally?

β€’ 2.06-approximation – Previous: 2.5-approximation [Ailon, Charikar, Newman,

JACM’08]

β€’ 3-approximation for objects of π’Œ types (comparisons data only between different types) – Matching 3-integrality gap

– Previous: 4-approximation for 2 types [Ailon, Avigdor-Elgrabli, Libety, van Zuylen, SICOMP’11]

β€’ 1.5-approximation for weighted comparison data satisfying triangle inequalities – Integrality gap 1.2

– Previous: 2-approximation [Ailon, Charikar, Newman, JACM’08]

Page 13: Near-Optimal LP Rounding for Correlation Clustering

LP-based Pivoting Algorithm [ACN]

Get all β€œdistances” π‘₯𝑒𝑣 by solving the LP

β€’ Pick a random pivot vertex 𝒑

β€’ Let S 𝒑 be a random set containing every other vertex 𝒗 with probability 1 βˆ’ π‘₯𝒑𝑣 (independently)

β€’ Make a cluster 𝒑 βˆͺ 𝑆(𝒑)

β€’ Remove the cluster from the graph and repeat

Minimize: π‘₯𝑒𝑣𝑒,𝑣 ∈𝐸 + (1 βˆ’ π‘₯𝑒𝑣)𝑒,𝑣 βˆ‰πΈ

π‘₯𝑒𝑣 ≀ π‘₯𝑒𝑀 + π‘₯𝑀𝑣 βˆ€π‘’, 𝑣, 𝑀 π‘₯𝑒𝑣 ∈ ,0,1-

Page 14: Near-Optimal LP Rounding for Correlation Clustering

LP-based Pivoting Algorithm [ACN]

β€’ LP solution π‘₯𝑒𝑣:

β€“πŸ

𝟐 for edges (𝑒, 𝑣𝑖)

– 1 for non-edges 𝑣𝑖 , 𝑣𝑗

– LP cost = Β½ (n - 1)

𝒖

π’—πŸ π’—π’βˆ’πŸ π’—πŸ π’—π’Š

𝟏

𝟐

𝟏

𝟐

𝟏

𝟐

𝟏

𝟐

𝟏

𝟐 …

𝟏 𝟏 𝟏 𝟏 𝟏 …

Get all β€œdistances” π‘₯𝑒𝑣 by solving the LP β€’ Pick a random pivot vertex 𝒑 β€’ Let S 𝒑 be a random set containing every other

vertex 𝒗 with probability 1 βˆ’ π‘₯𝒑𝑣 (independently)

β€’ Make a cluster 𝒑 βˆͺ 𝑆(𝒑) β€’ Remove the cluster from the graph and repeat

Page 15: Near-Optimal LP Rounding for Correlation Clustering

LP-based Pivoting Algorithm

β€’ π’—π’Š is a pivot (prob. 1 - 1/n) 𝔼 π‘π‘œπ‘ π‘‘|π’—π’Š is a pivot β‰ˆ ½𝑛 + Β½ 𝔼 π‘π‘œπ‘ π‘‘

β€’ 𝒖 is a pivot (prob. 1/n)

𝔼 π‘π‘œπ‘ π‘‘|𝒖 is a pivot β‰ˆπ‘›2

8

β€’ 𝔼 π‘π‘œπ‘ π‘‘ β‰ˆ 𝔼 π‘π‘œπ‘ π‘‘|π’—π’Š is a pivot +1

𝑛𝔼 π‘π‘œπ‘ π‘‘|𝒖 is a pivot =

n

2+1

2𝔼 π‘π‘œπ‘ π‘‘ +

𝑛

8β‡’ 𝔼 π‘π‘œπ‘ π‘‘ β‰ˆ

5𝑛

4

β€’ LP β‰ˆπ‘›

2 β‡’ 𝔼 π‘π‘œπ‘ π‘‘

πΏπ‘ƒβ‰ˆ5

2= approximation in the ACN analysis

𝒖

π’—πŸ π’—π’βˆ’πŸ π’—πŸ π’—π’Š

𝟏

𝟐

𝟏

𝟐

𝟏

𝟐

𝟏

𝟐

𝟏

𝟐 …

𝟏 𝟏 𝟏 𝟏 𝟏 …

Page 16: Near-Optimal LP Rounding for Correlation Clustering

Our (Data + LP)-Based Pivoting

β€’ Data-Based Pivoting: 𝒇(π‘₯𝒑𝑣, (𝒑, 𝑣)) =

β€’ LP-Based Pivoting: 𝒇(π‘₯𝒑𝑣, (𝒑, 𝑣)) = 1 βˆ’ π‘₯𝒑𝑣

Get all β€œdistances” π‘₯𝑒𝑣 by solving the LP

β€’ Pick a random pivot vertex 𝒑

β€’ Let S 𝒑 be a random set containing every other vertex 𝒗 with probability 𝒇(π‘₯𝒑𝑣, (𝒑, 𝑣)) (independently)

β€’ Make a cluster 𝒑 βˆͺ 𝑆(𝒑)

β€’ Remove the cluster from the graph and repeat

{ 1, if (𝒑, 𝑣) is an edge 0, if (𝒑, 𝑣) is a non-edge

Page 17: Near-Optimal LP Rounding for Correlation Clustering

Our (Data + LP)-Based Pivoting β€’ (Data + LP)-Based Pivoting:

𝒇(π‘₯𝒑𝑣, (𝒑, 𝑣)) =

{ 1 βˆ’ 𝒇+(π‘₯𝒑𝑣), if (𝒑, 𝑣) is an edge

1 βˆ’ π‘₯𝒑𝑣 , if (𝒑, 𝑣) is a non-edge

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.0

4

0.0

8

0.1

2

0.1

6

0.2

0.2

4

0.2

8

0.3

2

0.3

6

0.4

0.4

4

0.4

8

0.5

2

0.5

6

0.6

0.6

4

0.6

8

0.7

2

0.7

6

0.8

0.8

4

0.8

8

0.9

2

0.9

6 1

𝒇+(π‘₯) =

{ 0, if π‘₯ ≀ π‘Ž 1, if π‘₯ β‰₯ 𝑏 π‘₯ βˆ’π‘Ž

π‘βˆ’π‘Ž

2, otherwise

π‘Ž = 0.19, 𝑏 = 0.5095

Page 18: Near-Optimal LP Rounding for Correlation Clustering

Analysis

β€’ 𝑆𝑑 = cluster constructed at pivoting step 𝑑

β€’ 𝑉𝑑 = set of vertices left before pivoting step 𝑑

𝑉1 = 𝑉 𝑉2 𝑉3

𝑆1 𝑆2 𝑆3

Page 19: Near-Optimal LP Rounding for Correlation Clustering

Analysis

β€’ 𝐴𝐿𝐺𝑑 =

πŸ™ 𝑒 ∈ 𝑆𝑑 , 𝑣 βˆ‰ 𝑆𝑑 + πŸ™ 𝑒 βˆ‰ 𝑆𝑑 , 𝑣 ∈ 𝑆𝑑𝑒,𝑣 βˆˆπΈπ‘’,π‘£βˆˆπ‘‰π‘‘

+ πŸ™ 𝑒 ∈ 𝑆𝑑 , 𝑣 ∈ 𝑆𝑑𝑒,𝑣 βˆ‰πΈπ‘’,π‘£βˆˆπ‘‰π‘‘

β€’ 𝐿𝑃𝑑 =

πŸ™ 𝑒 ∈ 𝑆𝑑 π‘œπ‘Ÿ 𝑣 ∈ 𝑆𝑑𝑒,𝑣 βˆˆπΈπ‘’,π‘£βˆˆπ‘‰π‘‘

π‘₯𝑒𝑣 + πŸ™ 𝑒 ∈ 𝑆𝑑 π‘œπ‘Ÿ 𝑣 ∈ 𝑆𝑑 (1 βˆ’ π‘₯𝑒𝑣)

𝑒,𝑣 βˆ‰πΈπ‘’,π‘£βˆˆπ‘‰π‘‘

β€’ Suffices to show: 𝔼 𝐴𝐿𝐺𝑑 ≀ 𝜢 𝔼 𝐿𝑃𝑑

β€’ 𝔼 𝐴𝐿𝐺 = 𝔼 𝐴𝐿𝐺𝑑𝑑 ≀ 𝜢 𝔼 𝐿𝑃𝑑𝑑 = 𝜢 𝐿𝑃

𝑉𝑑 𝑉𝑑+1

𝑆𝑑 π‘₯𝑒𝑣 1 βˆ’ π‘₯𝑒𝑣

Page 20: Near-Optimal LP Rounding for Correlation Clustering

Triangle-Based Analysis: Algorithm

β€’ π΄πΏπΊπ’˜ 𝒖, 𝒗 = 𝔼 π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ π‘œπ‘› 𝒖, 𝒗 𝒑 = π’˜; 𝒖 β‰  𝒗,π’˜ ∈ 𝑉𝑑-

=

{ 𝒇 π‘₯π’˜π’– 1 βˆ’ 𝒇 π‘₯π’˜π’— + 𝒇 π‘₯π’˜π’— 1 βˆ’ 𝒇 π‘₯π’˜π’– , if 𝒖, 𝒗 ∈ 𝐸 𝒇(π‘₯π’˜π’–) 𝒇(π‘₯π’˜π’—), if 𝒖, 𝒗 βˆ‰ 𝐸

π’˜

𝒖 𝒗

π’˜

𝒖 𝒗

π’˜

𝒖 𝒗

π’˜

𝒖 𝒗

Page 21: Near-Optimal LP Rounding for Correlation Clustering

Triangle-Based Analysis: LP

β€’ πΏπ‘ƒπ’˜ 𝒖, 𝒗 = 𝔼 𝐿𝑃 π‘π‘œπ‘›π‘‘π‘Ÿπ‘–π‘π‘’π‘‘π‘–π‘œπ‘› π‘œπ‘“ 𝒖, 𝒗 𝒑 = π’˜;𝒖 β‰  𝒗,π’˜ ∈ 𝑉𝑑 -

=

𝒇 π‘₯π’˜π’– + 𝒇 π‘₯π’˜π’— βˆ’ 𝒇 π‘₯π’˜π’– 𝒇 π‘₯π’˜π’— 𝒙𝒖𝒗, if 𝒖, 𝒗 ∈ 𝐸 𝒇 π‘₯π’˜π’– + 𝒇 π‘₯π’˜π’— βˆ’ 𝒇 π‘₯π’˜π’– 𝒇 π‘₯π’˜π’— (1 βˆ’ 𝒙𝒖𝒗), if 𝒖, 𝒗 βˆ‰ 𝐸 { π’˜

𝒖 𝒗

π’˜

𝒖 𝒗

π’˜

𝒖 𝒗 𝒙𝒖𝒗 𝒙𝒖𝒗 𝒙𝒖𝒗

Page 22: Near-Optimal LP Rounding for Correlation Clustering

Triangle-Based Analysis

β€’ 𝔼 𝐴𝐿𝐺𝒕 = 1

|𝑉𝑑| π΄πΏπΊπ’˜ 𝒖, π’—π’˜ βˆˆπ‘‰π‘‘π’–,π’—βˆˆπ‘‰π‘‘

=

1

2|Vt| π΄πΏπΊπ’˜ 𝒖, 𝒗𝑒,𝑣,π‘€βˆˆπ‘‰π‘‘,𝑒≠𝑣

β€’ 𝔼 𝐿𝑃𝒕 = 1

|𝑉𝑑| πΏπ‘ƒπ’˜ 𝒖, π’—π’˜ βˆˆπ‘‰π‘‘π’–,π’—βˆˆπ‘‰π‘‘

=

1

2|Vt| πΏπ‘ƒπ’˜ 𝒖, 𝒗𝑒,𝑣,π‘€βˆˆπ‘‰π‘‘,𝑒≠𝑣

β€’ Suffices to show that for all triangles (𝒖, 𝒗,π’˜)

π΄πΏπΊπ’˜ 𝒖, 𝒗 ≀ πœΆπΏπ‘ƒπ’˜ 𝒖, 𝒗

Page 23: Near-Optimal LP Rounding for Correlation Clustering

Triangle-Based Analysis

β€’ For all triangles (𝒖, 𝒗,π’˜)

π΄πΏπΊπ’˜ 𝒖, 𝒗 ≀ πœΆπΏπ‘ƒπ’˜ 𝒖, 𝒗

β€’ Each triangle:

– Arbitrary edge / non-edge configuration (4 total)

– Arbitrary LP weights satisfying triangle inequality

β€’ For every fixed configuration functional inequality in LP weights (3 variables)

β€’ 𝜢 β‰ˆ 2.06! 𝜢 β‰₯ 2.025 for any 𝒇!

Page 24: Near-Optimal LP Rounding for Correlation Clustering

Our Results: Complete Graphs

β€’ 2.06-approximation for complete graphs

β€’ Can be derandomized (previous: [Hegde, Jain, Williamson, van Zuylen β€˜08])

β€’ Also works for real weights satisfying probability constraints

Minimize: π‘₯𝑒𝑣𝑒,𝑣 ∈𝐸 + (1 βˆ’ π‘₯𝑒𝑣)𝑒,𝑣 βˆ‰πΈ

π‘₯𝑒𝑣 ≀ π‘₯𝑒𝑀 + π‘₯𝑀𝑣 βˆ€π‘’, 𝑣, 𝑀 π‘₯𝑒𝑣 ∈ *0,1+

Page 25: Near-Optimal LP Rounding for Correlation Clustering

Our Results: Triangle Inequalities

β€’ Weights satisfying triangle inequalities and probability constraints:

– 𝒄𝑒𝑣 ∈ ,0,1-

– 𝒄𝑒𝑣 ≀ 𝒄𝑒𝑀 + 𝒄𝑀𝑣 βˆ€u, v, w

β€’ 1.5-approximation β€’ 1.2 integrality gap

Minimize: (1 βˆ’ 𝒄𝑒𝑣)π‘₯𝑒𝑣𝑒,𝑣 + 𝒄𝑒𝑣 1 βˆ’ π‘₯𝑒𝑣

π‘₯𝑒𝑣 ≀ π‘₯𝑒𝑀 + π‘₯𝑀𝑣 βˆ€π‘’, 𝑣, 𝑀 π‘₯𝑒𝑣 ∈ *0,1+

Page 26: Near-Optimal LP Rounding for Correlation Clustering

Our Results: Objects of π’Œ types

β€’ Objects of k-types: – 𝒄𝑒𝑣 ∈ *0,1+

– 𝑬 = edges of a complete π’Œ-partite graph

β€’ 3-approximation

β€’ Matching 3-integrality gap

Minimize: (1 βˆ’ 𝒄𝑒𝑣)π‘₯𝑒𝑣𝑒,𝑣 βˆˆπ‘¬ + 𝒄𝑒𝑣 1 βˆ’ π‘₯𝑒𝑣

π‘₯𝑒𝑣 ≀ π‘₯𝑒𝑀 + π‘₯𝑀𝑣 βˆ€π‘’, 𝑣, 𝑀 π‘₯𝑒𝑣 ∈ *0,1+

Page 27: Near-Optimal LP Rounding for Correlation Clustering

Thanks! Better approximation: β€’ Can stronger convex relaxations help?

– Integrality gap for natural Semi-Definite Program is β‰₯1

2 βˆ’ 2β‰ˆ 1.7

– Can LP/SDP hierarchies help?

Better running time: β€’ Avoid solving LP? β€’ < 3-approximation in MapReduce?

Related scenarios: β€’ Better than 4/3-approximation for consensus clustering? β€’ o(log n)-approximation for arbitrary weights (would improve

MultiCut, no constant –factor possible under UGC [Chawla, Krauthgamer, Kumar, Rabani, Sivakumar ’06])