near-optimal lp rounding for correlation clustering
TRANSCRIPT
Near-Optimal LP Rounding for Correlation Clustering
Grigory Yaroslavtsev
http://grigory.us
With Shuchi Chawla (University of Wisconsin, Madison), Konstantin Makarychev (Microsoft Research), Tselil Schramm (University of California, Berkeley)
Correlation Clustering
β’ Inspired by machine learning at
β’ Practice: [Cohen, McCallum β01, Cohen, Richman β02]
β’ Theory: [Blum, Bansal, Chawla β04]
Correlation Clustering: Example
β’ Minimize # of incorrectly classified pairs:
# Covered non-edges + # Non-covered edges
β’ Min-CSP, but # labels is unbounded
4 incorrectly classified = 1 covered non-edge + 3 non-covered edges
Approximating Correlation Clustering
β’ Minimize # of incorrectly classified pairs
β β 20000-approximation [Blum, Bansal, Chawlaβ04]
β [Demaine, Emmanuel, Fiat, Immorlicaβ04],[Charikar, Guruswami, Wirthβ05], [Williamson, van Zuylenβ07], [Ailon, Libertyβ08],β¦
β 2.5 [Ailon, Charikar, Newmanβ05]
β APX-hard [Charikar, Guruswami, Wirthβ05]
β’ Maximize # of correctly classified pairs
β (1 β π)-approximation [Blum, Bansal, Chawlaβ04]
Correlation Clustering
One of the most successful clustering methods:
β’ Only uses qualitative information about similarities
β’ # of clusters unspecified (selected to best fit data)
β’ Applications: document/image deduplication (data from crowds or black-box machine learning)
β’ NP-hard [Bansal, Blum, Chawla β04], admits simple approximation algorithms with good provable guarantees
β’ Agnostic learning problem
Correlation Clustering
More:
β’ Survey [Wirth]
β’ KDDβ14 tutorial: βCorrelation Clustering: From Theory to Practiceβ [Bonchi, Garcia-Soriano, Liberty] http://francescobonchi.com/CCtuto_kdd14.pdf
β’ Wikipedia article: http://en.wikipedia.org/wiki/Correlation_clustering
Data-Based Randomized Pivoting
3-approximation (expected) [Ailon, Charikar, Newman]
Algorithm:
β’ Pick a random pivot vertex π
β’ Make a cluster π βͺ π(π), where π π is the set of neighbors of π
β’ Remove the cluster from the graph and repeat
Modification: (3 + π)-approx. in π(log2 π / π) rounds of MapReduce [Chierichetti, Dalvi, Kumar, KDDβ14] http://grigory.us/blog/mapreduce-clustering
Data-Based Randomized Pivoting
β’ Pick a random pivot vertex π
β’ Make a cluster π βͺ π(π), where π π is the set of neighbors of π
β’ Remove the cluster from the graph and repeat
8 incorrectly classified = 2 covered non-edges + 6 non-covered edges
Integer Program
Minimize: π₯π’π£π’,π£ βπΈ + (1 β π₯π’π£)π’,π£ βπΈ
π₯π’π£ β€ π₯π’π€ + π₯π€π£ βπ’, π£, π€ π₯π’π£ β *0,1+
β’ Binary distance: β’ π₯π’π£ = 0 π’ and π£ in the same cluster β’ π₯π’π£ = 1 π’ and π£ in different clusters
β’ Objective is exactly MinDisagree β’ Triangle inequalities give transitivity:
β’ π₯π’π€ = 0, π₯π€π£ = 0 β π₯π’π£ = 0 β’ π’ βΌ π£ iff π₯π’π£ = 0 is an equivalence relation,
equivalence classes form a partition
Linear Program
β’ Embed vertices into a (pseudo)metric:
β’ Integrality gap β₯ 2 β o(1)
Minimize: π₯π’π£π’,π£ βπΈ + (1 β π₯π’π£)π’,π£ βπΈ
π₯π’π£ β€ π₯π’π€ + π₯π€π£ βπ’, π£, π€ π₯π’π£ β ,0,1-
Integrality Gap
β’ IP cost = n β 2
β’ LP solution π₯π’π£:
βπ
π for edges (π’, π£π)
β 1 for non-edges π£π , π£π β LP cost = Β½ (n - 1)
β’ IP / LP = 2 β o(1)
Minimize: π₯π’π£π’,π£ βπΈ + (1 β π₯π’π£)π’,π£ βπΈ
π₯π’π£ β€ π₯π’π€ + π₯π€π£ βπ’, π£, π€ π₯π’π£ β ,0,1-
π
ππ ππβπ ππ ππ
π
π
π
π
π
π
π
π
π
π β¦
π π π π π β¦
Can the LP be rounded optimally?
β’ 2.06-approximation β Previous: 2.5-approximation [Ailon, Charikar, Newman,
JACMβ08]
β’ 3-approximation for objects of π types (comparisons data only between different types) β Matching 3-integrality gap
β Previous: 4-approximation for 2 types [Ailon, Avigdor-Elgrabli, Libety, van Zuylen, SICOMPβ11]
β’ 1.5-approximation for weighted comparison data satisfying triangle inequalities β Integrality gap 1.2
β Previous: 2-approximation [Ailon, Charikar, Newman, JACMβ08]
LP-based Pivoting Algorithm [ACN]
Get all βdistancesβ π₯π’π£ by solving the LP
β’ Pick a random pivot vertex π
β’ Let S π be a random set containing every other vertex π with probability 1 β π₯ππ£ (independently)
β’ Make a cluster π βͺ π(π)
β’ Remove the cluster from the graph and repeat
Minimize: π₯π’π£π’,π£ βπΈ + (1 β π₯π’π£)π’,π£ βπΈ
π₯π’π£ β€ π₯π’π€ + π₯π€π£ βπ’, π£, π€ π₯π’π£ β ,0,1-
LP-based Pivoting Algorithm [ACN]
β’ LP solution π₯π’π£:
βπ
π for edges (π’, π£π)
β 1 for non-edges π£π , π£π
β LP cost = Β½ (n - 1)
π
ππ ππβπ ππ ππ
π
π
π
π
π
π
π
π
π
π β¦
π π π π π β¦
Get all βdistancesβ π₯π’π£ by solving the LP β’ Pick a random pivot vertex π β’ Let S π be a random set containing every other
vertex π with probability 1 β π₯ππ£ (independently)
β’ Make a cluster π βͺ π(π) β’ Remove the cluster from the graph and repeat
LP-based Pivoting Algorithm
β’ ππ is a pivot (prob. 1 - 1/n) πΌ πππ π‘|ππ is a pivot β Β½π + Β½ πΌ πππ π‘
β’ π is a pivot (prob. 1/n)
πΌ πππ π‘|π is a pivot βπ2
8
β’ πΌ πππ π‘ β πΌ πππ π‘|ππ is a pivot +1
ππΌ πππ π‘|π is a pivot =
n
2+1
2πΌ πππ π‘ +
π
8β πΌ πππ π‘ β
5π
4
β’ LP βπ
2 β πΌ πππ π‘
πΏπβ5
2= approximation in the ACN analysis
π
ππ ππβπ ππ ππ
π
π
π
π
π
π
π
π
π
π β¦
π π π π π β¦
Our (Data + LP)-Based Pivoting
β’ Data-Based Pivoting: π(π₯ππ£, (π, π£)) =
β’ LP-Based Pivoting: π(π₯ππ£, (π, π£)) = 1 β π₯ππ£
Get all βdistancesβ π₯π’π£ by solving the LP
β’ Pick a random pivot vertex π
β’ Let S π be a random set containing every other vertex π with probability π(π₯ππ£, (π, π£)) (independently)
β’ Make a cluster π βͺ π(π)
β’ Remove the cluster from the graph and repeat
{ 1, if (π, π£) is an edge 0, if (π, π£) is a non-edge
Our (Data + LP)-Based Pivoting β’ (Data + LP)-Based Pivoting:
π(π₯ππ£, (π, π£)) =
{ 1 β π+(π₯ππ£), if (π, π£) is an edge
1 β π₯ππ£ , if (π, π£) is a non-edge
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
0.0
4
0.0
8
0.1
2
0.1
6
0.2
0.2
4
0.2
8
0.3
2
0.3
6
0.4
0.4
4
0.4
8
0.5
2
0.5
6
0.6
0.6
4
0.6
8
0.7
2
0.7
6
0.8
0.8
4
0.8
8
0.9
2
0.9
6 1
π+(π₯) =
{ 0, if π₯ β€ π 1, if π₯ β₯ π π₯ βπ
πβπ
2, otherwise
π = 0.19, π = 0.5095
Analysis
β’ ππ‘ = cluster constructed at pivoting step π‘
β’ ππ‘ = set of vertices left before pivoting step π‘
π1 = π π2 π3
π1 π2 π3
Analysis
β’ π΄πΏπΊπ‘ =
π π’ β ππ‘ , π£ β ππ‘ + π π’ β ππ‘ , π£ β ππ‘π’,π£ βπΈπ’,π£βππ‘
+ π π’ β ππ‘ , π£ β ππ‘π’,π£ βπΈπ’,π£βππ‘
β’ πΏππ‘ =
π π’ β ππ‘ ππ π£ β ππ‘π’,π£ βπΈπ’,π£βππ‘
π₯π’π£ + π π’ β ππ‘ ππ π£ β ππ‘ (1 β π₯π’π£)
π’,π£ βπΈπ’,π£βππ‘
β’ Suffices to show: πΌ π΄πΏπΊπ‘ β€ πΆ πΌ πΏππ‘
β’ πΌ π΄πΏπΊ = πΌ π΄πΏπΊπ‘π‘ β€ πΆ πΌ πΏππ‘π‘ = πΆ πΏπ
ππ‘ ππ‘+1
ππ‘ π₯π’π£ 1 β π₯π’π£
Triangle-Based Analysis: Algorithm
β’ π΄πΏπΊπ π, π = πΌ πππππ ππ π, π π = π; π β π,π β ππ‘-
=
{ π π₯ππ 1 β π π₯ππ + π π₯ππ 1 β π π₯ππ , if π, π β πΈ π(π₯ππ) π(π₯ππ), if π, π β πΈ
π
π π
π
π π
π
π π
π
π π
Triangle-Based Analysis: LP
β’ πΏππ π, π = πΌ πΏπ ππππ‘ππππ’π‘πππ ππ π, π π = π;π β π,π β ππ‘ -
=
π π₯ππ + π π₯ππ β π π₯ππ π π₯ππ πππ, if π, π β πΈ π π₯ππ + π π₯ππ β π π₯ππ π π₯ππ (1 β πππ), if π, π β πΈ { π
π π
π
π π
π
π π πππ πππ πππ
Triangle-Based Analysis
β’ πΌ π΄πΏπΊπ = 1
|ππ‘| π΄πΏπΊπ π, ππ βππ‘π,πβππ‘
=
1
2|Vt| π΄πΏπΊπ π, ππ’,π£,π€βππ‘,π’β π£
β’ πΌ πΏππ = 1
|ππ‘| πΏππ π, ππ βππ‘π,πβππ‘
=
1
2|Vt| πΏππ π, ππ’,π£,π€βππ‘,π’β π£
β’ Suffices to show that for all triangles (π, π,π)
π΄πΏπΊπ π, π β€ πΆπΏππ π, π
Triangle-Based Analysis
β’ For all triangles (π, π,π)
π΄πΏπΊπ π, π β€ πΆπΏππ π, π
β’ Each triangle:
β Arbitrary edge / non-edge configuration (4 total)
β Arbitrary LP weights satisfying triangle inequality
β’ For every fixed configuration functional inequality in LP weights (3 variables)
β’ πΆ β 2.06! πΆ β₯ 2.025 for any π!
Our Results: Complete Graphs
β’ 2.06-approximation for complete graphs
β’ Can be derandomized (previous: [Hegde, Jain, Williamson, van Zuylen β08])
β’ Also works for real weights satisfying probability constraints
Minimize: π₯π’π£π’,π£ βπΈ + (1 β π₯π’π£)π’,π£ βπΈ
π₯π’π£ β€ π₯π’π€ + π₯π€π£ βπ’, π£, π€ π₯π’π£ β *0,1+
Our Results: Triangle Inequalities
β’ Weights satisfying triangle inequalities and probability constraints:
β ππ’π£ β ,0,1-
β ππ’π£ β€ ππ’π€ + ππ€π£ βu, v, w
β’ 1.5-approximation β’ 1.2 integrality gap
Minimize: (1 β ππ’π£)π₯π’π£π’,π£ + ππ’π£ 1 β π₯π’π£
π₯π’π£ β€ π₯π’π€ + π₯π€π£ βπ’, π£, π€ π₯π’π£ β *0,1+
Our Results: Objects of π types
β’ Objects of k-types: β ππ’π£ β *0,1+
β π¬ = edges of a complete π-partite graph
β’ 3-approximation
β’ Matching 3-integrality gap
Minimize: (1 β ππ’π£)π₯π’π£π’,π£ βπ¬ + ππ’π£ 1 β π₯π’π£
π₯π’π£ β€ π₯π’π€ + π₯π€π£ βπ’, π£, π€ π₯π’π£ β *0,1+
Thanks! Better approximation: β’ Can stronger convex relaxations help?
β Integrality gap for natural Semi-Definite Program is β₯1
2 β 2β 1.7
β Can LP/SDP hierarchies help?
Better running time: β’ Avoid solving LP? β’ < 3-approximation in MapReduce?
Related scenarios: β’ Better than 4/3-approximation for consensus clustering? β’ o(log n)-approximation for arbitrary weights (would improve
MultiCut, no constant βfactor possible under UGC [Chawla, Krauthgamer, Kumar, Rabani, Sivakumar β06])