composable core-sets for diversity and coverage maximizationmahabadi/slides/coreset.pdf ·...

64
Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi (MIT) Mohammad Mahdian (Google) Vahab S. Mirrokni (Google)

Upload: others

Post on 21-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets for Diversity and Coverage

Maximization

Piotr Indyk (MIT) Sepideh Mahabadi (MIT)

Mohammad Mahdian (Google) Vahab S. Mirrokni (Google)

Page 2: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Core-Set Definition • Setup

– Set of 𝑛 points 𝑷 in 𝑑-dimensional space

– Optimize a function 𝑓

Page 3: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Core-Set Definition • Setup

– Set of 𝑛 points 𝑷 in 𝑑-dimensional space

– Optimize a function 𝑓

• 𝒄-Core-set: Small subset of points S ⊂ 𝑃 which suffices to 𝑐-approximate the optimal solution

• Maximization: 𝑓𝑜𝑜𝑜 𝑃

𝑐≤ 𝑓𝑜𝑜𝑜 𝑆 ≤ 𝑓𝑜𝑜𝑜(𝑃)

Page 4: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Core-Set Definition • Setup

– Set of 𝑛 points 𝑷 in 𝑑-dimensional space

– Optimize a function 𝑓

• 𝒄-Core-set: Small subset of points S ⊂ 𝑃 which suffices to 𝑐-approximate the optimal solution

• Maximization: 𝑓𝑜𝑜𝑜 𝑃

𝑐≤ 𝑓𝑜𝑜𝑜 𝑆 ≤ 𝑓𝑜𝑜𝑜(𝑃)

• Example

– Optimization Function: Distance of the two farthest points

Page 5: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Core-Set Definition • Setup

– Set of 𝑛 points 𝑷 in 𝑑-dimensional space

– Optimize a function 𝑓

• 𝒄-Core-set: Small subset of points S ⊂ 𝑃 which suffices to 𝑐-approximate the optimal solution

• Maximization: 𝑓𝑜𝑜𝑜 𝑃

𝑐≤ 𝑓𝑜𝑜𝑜 𝑆 ≤ 𝑓𝑜𝑜𝑜(𝑃)

• Example

– Optimization Function: Distance of the two farthest points

– 1-Core-set: Points on the convex hull.

Page 6: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

Page 7: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

• 𝒄-Composable Core-sets: Subsets of points S1 ⊂ 𝑃1, S2 ⊂ 𝑃2, … , Sm ⊂ 𝑃𝑚 points such that the solution of the union of the core-sets approximates the solution of the point sets.

• Maximization : 1𝑐𝑓𝑜𝑜𝑜 𝑃1 ∪ ⋯∪ 𝑃𝑚 ≤ 𝑓opt S1 ∪ ⋯∪ 𝑆𝑚 ≤ 𝑓𝑜𝑜𝑜(𝑃1 ∪⋯∪ 𝑃𝑚)

Page 8: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

• 𝒄-Composable Core-sets: Subsets of points S1 ⊂ 𝑃1, S2 ⊂ 𝑃2, … , Sm ⊂ 𝑃𝑚 points such that the solution of the union of the core-sets approximates the solution of the point sets.

• Maximization : 1𝑐𝑓𝑜𝑜𝑜 𝑃1 ∪ ⋯∪ 𝑃𝑚 ≤ 𝑓opt S1 ∪ ⋯∪ 𝑆𝑚 ≤ 𝑓𝑜𝑜𝑜(𝑃1 ∪⋯∪ 𝑃𝑚)

• Example: two farthest points

Page 9: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

• 𝒄-Composable Core-sets: Subsets of points S1 ⊂ 𝑃1, S2 ⊂ 𝑃2, … , Sm ⊂ 𝑃𝑚 points such that the solution of the union of the core-sets approximates the solution of the point sets.

• Maximization : 1𝑐𝑓𝑜𝑜𝑜 𝑃1 ∪ ⋯∪ 𝑃𝑚 ≤ 𝑓opt S1 ∪ ⋯∪ 𝑆𝑚 ≤ 𝑓𝑜𝑜𝑜(𝑃1 ∪⋯∪ 𝑃𝑚)

• Example: two farthest points

Page 10: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets • Setup

– 𝑷𝟏,𝑷𝟐, … ,𝑷𝒎 are set of points in 𝑑-dimensional space – Optimize a function 𝑓 over their union 𝑷.

• 𝒄-Composable Core-sets: Subsets of points S1 ⊂ 𝑃1, S2 ⊂ 𝑃2, … , Sm ⊂ 𝑃𝑚 points such that the solution of the union of the core-sets approximates the solution of the point sets.

• Maximization : 1𝑐𝑓𝑜𝑜𝑜 𝑃1 ∪ ⋯∪ 𝑃𝑚 ≤ 𝑓opt S1 ∪ ⋯∪ 𝑆𝑚 ≤ 𝑓𝑜𝑜𝑜(𝑃1 ∪⋯∪ 𝑃𝑚)

• Example: two farthest points

Page 11: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Streaming Computation

• Streaming Computation: – Processing sequence of 𝑛 data elements “on the fly” – limited Storage

Page 12: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Streaming Computation

• Streaming Computation: – Processing sequence of 𝑛 data elements “on the fly” – limited Storage

• 𝒄-Composable Core-set of size 𝒌 – Chunks of size 𝑛𝑛 , thus number of chunks = 𝑛/𝑛

Page 13: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Streaming Computation

• Streaming Computation: – Processing sequence of 𝑛 data elements “on the fly” – limited Storage

• 𝒄-Composable Core-set of size 𝒌 – Chunks of size 𝑛𝑛 , thus number of chunks = 𝑛/𝑛 – Core-set for each chunk – Total Space: 𝑛 𝑛/𝑛 + 𝑛𝑛 = 𝑂( 𝑛𝑛) – Approximation Factor: 𝑐

Page 14: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Distributed Systems • Streaming Computation • Distributed System:

– Each machine holds a block of data. – A composable core-set is computed and sent to the server

Page 15: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Distributed Systems • Streaming Computation • Distributed System:

– Each machine holds a block of data. – A composable core-set is computed and sent to the server

• Map-Reduce Model: • One round of Map-Reduce • 𝑛/𝑛 mappers each getting 𝑛𝑛 points • Mapper computes a composable core-set of size 𝑛 • Will be passed to a single reducer

Page 16: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Similarity Search • Streaming Computation • Distributed System • Similarity Search: Small output size

Page 17: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Similarity Search • Streaming Computation • Distributed System • Similarity Search: Small output size • Good to have result from each

cluster: relevant and diverse

Page 18: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Similarity Search • Streaming Computation • Distributed System • Similarity Search: Small output size • Good to have result from each

cluster: relevant and diverse • Diverse Near Neighbor Problem

[Abbar, Amer-Yahia, Indyk, Mahabadi WWW’13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCG’13]

Page 19: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Applications – Similarity Search • Streaming Computation • Distributed System • Similarity Search: Small output size • Good to have result from each

cluster: relevant and diverse • Diverse Near Neighbor Problem

[Abbar, Amer-Yahia, Indyk, Mahabadi WWW’13] [Abbar, Amer-Yahia, Indyk, Mahabadi, Varadarajan, SoCG’13]

– uses Locality Sensitive Hashing (LSH) and Composable Core-sets techniques.

Page 20: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Maximization Problem

• A set of 𝑛 points 𝑃 in metric space (Δ,𝑑𝑑𝑑𝑑)

• Optimization Problem: – Find a subset of 𝑛 points 𝑆 which

maximizes Diversity

k=4 n = 6

Page 21: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Maximization Problem

• A set of 𝑛 points 𝑃 in metric space (Δ,𝑑𝑑𝑑𝑑)

• Optimization Problem: – Find a subset of 𝑛 points 𝑆 which

maximizes Diversity • Diversity:

– Minimum pairwise distance (Remote Edge)

k=4 n = 6

Page 22: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Maximization Problem

• A set of 𝑛 points 𝑃 in metric space (Δ,𝑑𝑑𝑑𝑑)

• Optimization Problem: – Find a subset of 𝑛 points 𝑆 which

maximizes Diversity • Diversity:

– Minimum pairwise distance (Remote Edge)

– Sum of Pairwise distances (Remote Clique) k=4

n = 6

Page 23: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Maximization Problem

• A set of 𝑛 points 𝑃 in metric space (Δ,𝑑𝑑𝑑𝑑)

• Optimization Problem: – Find a subset of 𝑛 points 𝑆 which

maximizes Diversity • Diversity:

– Minimum pairwise distance (Remote Edge)

– Sum of Pairwise distances (Remote Clique)

• Long list of variants [Chandra and Halldorsson ‘01]

k=4 n = 6

Page 24: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Diversity Functions Diversity function over

a set 𝑆 of 𝑛 point Description

Remote-edge Minimum Pairwise Distance: min𝑜,𝑞∈𝑆

𝑑𝑑𝑑𝑑(𝑝, 𝑞)

Remote-clique Sum of Pairwise Distances : ∑ 𝑑𝑑𝑑𝑑(𝑝, 𝑞)𝑜,𝑞∈𝑆

Remote-tree Weight of Minimum Spanning Tree (MST) of the set 𝑆

Remote-cycle Weight of minimum Traveling Salesman Tour (TSP) of the set 𝑆

Remote-star Weight of minimum star: min𝑜∈𝑆

∑ 𝑑𝑑𝑑𝑑(𝑝, 𝑞)𝑞∈𝑆

Remote-Pseudoforest Sum of the distance of each point to its nearest neighbor ∑ min

𝑞∈𝑆𝑑𝑑𝑑𝑑(𝑝, 𝑞)𝑜∈𝑆

Remote-Matching Weight of minimum perfect Matching of the set 𝑆

Max-Coverage How well the points cover each coordinate

�max𝑜∈𝑆

𝑝𝑖

𝑑

𝑖=1

Page 25: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Our Results Diversity function Offline ApproxFactor Composable Coreset

Approx factor [Our Results]

Remote-edge Minimum Pairwise Distance 𝑂(1) [Tmair 91][White 91]

[Ravi et al 94]

𝑶(𝟏)

Remote-clique Sum of Pairwise Distances 𝑂(1) [Hassin et al 97]

𝑶(𝟏)

Remote-tree Weight of MST 𝑂(1) [Halldorsson et al 99]

𝑶(𝟏)

Remote-cycle Weight of minimum TSP 𝑂(1) [Halldorsson et al 99]

𝑶(𝟏)

Remote-star Weight of minimum star 𝑂(1) [Chandra&Halldorsson 01]

𝑶(𝟏)

Remote-Pseudoforest Sum of the distance of each point to its nearest neighbor

𝑂(log 𝑛) [Chandra&Halldorsson 01]

𝑶(𝐥𝐥𝐥 𝒌)

Remote-Matching Weight of minimum perfect Matching 𝑂(log 𝑛) [Chandra&Halldorsson 01]

𝑶(𝐥𝐥𝐥 𝒌)

Max-Coverage How well the points cover each coordinate

�max𝑜∈𝑆

𝑝𝑖

𝑑

𝑖=1

𝑂(1) [Feige 98]

No Composable Coreset of Poly size in 𝒌 with app. factor

𝒌𝒍𝒍𝒍 𝒌

Page 26: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Review of Offline Algorithms

• We have a set of 𝑛 point 𝑃 • Goal: find a subset 𝑆 of size 𝑛 which

maximizes the diversity

Page 27: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

The Greedy Algorithm

• Used for minimum-pairwise distance

Page 28: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

The Greedy Algorithm

• Used for minimum-pairwise distance • Greedy Algorithm [Ravi, Rosenkrantz,

Tayi] [Gonzales] – Choose an arbitrary point – Repeat k-1 times

• Add the point whose minimum distance to the currently chosen points is maximized

Page 29: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

The Greedy Algorithm

• Used for minimum-pairwise distance • Greedy Algorithm [Ravi, Rosenkrantz,

Tayi] [Gonzales] – Choose an arbitrary point – Repeat k-1 times

• Add the point whose minimum distance to the currently chosen points is maximized

• Remote-edge: computes a 2-

approximate set

Page 30: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances

Page 31: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points

Page 32: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points – While there exists a swap that improves

diversity by a factor of 1 + 𝜖𝑛

Page 33: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points – While there exists a swap that improves

diversity by a factor of 1 + 𝜖𝑛

» Perform the swap

Page 34: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points – While there exists a swap that improves

diversity by a factor of 1 + 𝜖𝑛

» Perform the swap

Page 35: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Local Search Algorithm • Used for sum of pairwise distances • Algorithm [Abbasi, Mirrokni, Thakur]

– Initialize 𝑆 with an arbitrary set of 𝑛 points which contains the two farthest points – While there exists a swap that improves

diversity by a factor of 1 + 𝜖𝑛

» Perform the swap

• For Remote-Clique – Number of rounds: log 1+𝜖𝑛

𝑛2 = 𝑂(𝑛𝜖

log 𝑛)

– Approximation factor is constant.

Page 36: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Composable Core-sets

• Greedy Algorithm Computes a 3-composable core-set for minimum pairwise distance

• Local Search Algorithm Computes a constant factor composable core-set for sum of pairwise distances.

Page 37: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖

Page 38: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c

Page 39: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c

Page 40: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c

Page 41: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c

Page 42: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Page 43: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂

Page 44: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂 Case 2: : 𝑟 ≤ 𝑶(𝑑𝑑𝑣(𝑂𝑃𝑂))

Page 45: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂 Case 2: : 𝑟 ≤ 𝑶(𝑑𝑑𝑣(𝑂𝑃𝑂)) • find a one-to-one mapping 𝜇 from 𝑂𝑃𝑂 = {𝑜1,⋯ , 𝑜𝑘} to 𝑆 = 𝑆1 ∪ ⋯∪ 𝑆𝑚 s.t.

𝑑𝑑𝑑𝑑 𝑜𝑖 ,𝜇 𝑜𝑖 ≤ 𝑶(𝑟)

Page 46: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂 Case 2: : 𝑟 ≤ 𝑶(𝑑𝑑𝑣(𝑂𝑃𝑂)) • find a one-to-one mapping 𝜇 from 𝑂𝑃𝑂 = {𝑜1,⋯ , 𝑜𝑘} to 𝑆 = 𝑆1 ∪ ⋯∪ 𝑆𝑚 s.t.

𝑑𝑑𝑑𝑑 𝑜𝑖 ,𝜇 𝑜𝑖 ≤ 𝑶(𝑟) • Replacing 𝑜𝑖 with 𝜇(𝑜𝑖) has still large diversity • 𝑑𝑑𝑣 𝜇 𝑜𝑖 is approximately as good as 𝑑𝑑𝑣 𝑜𝑖

Page 47: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Let 𝑃1,⋯ ,𝑃𝑚 be the set of points , 𝑃 = ⋃𝑃𝑖 𝑆1,⋯ , 𝑆𝑚 be their core-sets, S = ⋃𝑆𝑖 Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣𝑘(𝑃) / c Let 𝑂𝑃𝑂 = 𝑜1,⋯ , 𝑜𝑘 be the optimal solution Goal: 𝑑𝑑𝑣𝑘 𝑆 ≥ 𝑑𝑑𝑣(𝑂𝑃𝑂) / c Let 𝑟 be their maximum diversity , 𝑟 = max

𝑖 𝑑𝑑𝑣 𝑆𝑖 , Note: divk 𝑆 ≥ 𝑟

Case 1: one of 𝑆𝑖 has diversity as good as the optimum: 𝑟 ≥ 𝑶 𝑑𝑑𝑣 𝑂𝑃𝑂 Case 2: : 𝑟 ≤ 𝑶(𝑑𝑑𝑣(𝑂𝑃𝑂)) • find a one-to-one mapping 𝜇 from 𝑂𝑃𝑂 = {𝑜1,⋯ , 𝑜𝑘} to 𝑆 = 𝑆1 ∪ ⋯∪ 𝑆𝑚 s.t.

𝑑𝑑𝑑𝑑 𝑜𝑖 ,𝜇 𝑜𝑖 ≤ 𝑶(𝑟) • Replacing 𝑜𝑖 with 𝜇(𝑜𝑖) has still large diversity • 𝑑𝑑𝑣 𝜇 𝑜𝑖 is approximately as good as 𝑑𝑑𝑣 𝑜𝑖 • The actual mapping 𝜇 depends on the specific diversity measure we are considering.

Page 48: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Maximum k-Coverage • A set of 𝑛 points 𝑃 in 𝑑-dimensional space • Each dimension corresponds to a feature. • Goal: choose a set of 𝑛 points 𝑆 in 𝑃 which maximizes the total

coverage: – cov S = ∑ max

𝑠∈𝑆𝑑𝑖𝑑

𝑖=1

Page 49: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Maximum k-Coverage • A set of 𝑛 points 𝑃 in 𝑑-dimensional space • Each dimension corresponds to a feature. • Goal: choose a set of 𝑛 points 𝑆 in 𝑃 which maximizes the total

coverage: – cov S = ∑ max

𝑠∈𝑆𝑑𝑖𝑑

𝑖=1

• Special Case hamming space: • A collection of 𝑛 sets 𝑃 • Over the universe 𝑈 = 1, … ,𝑑 • Goal: choose 𝑛 sets 𝑆 = {𝑆1, … , 𝑆𝑘} in 𝑃 whose union is

maximized.

Page 50: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Maximum k-Coverage • A set of 𝑛 points 𝑃 in 𝑑-dimensional space • Each dimension corresponds to a feature. • Goal: choose a set of 𝑛 points 𝑆 in 𝑃 which maximizes the total

coverage: – cov S = ∑ max

𝑠∈𝑆𝑑𝑖𝑑

𝑖=1

• Special Case hamming space: • A collection of 𝑛 sets 𝑃 • Over the universe 𝑈 = 1, … ,𝑑 • Goal: choose 𝑛 sets 𝑆 = {𝑆1, … , 𝑆𝑘} in 𝑃 whose union is

maximized.

• Theorem: for any 𝛼 < 𝑘log 𝑘

and any constant 𝛽 > 1, there is no 𝛼-composable core-set of size 𝑛𝛽

Page 51: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = 1,⋯ ,𝑂 𝑛4

Page 52: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈

Page 53: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖

Page 54: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖 • 𝑃𝑖 has cardinality 𝑘

𝑘

Page 55: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖 • 𝑃𝑖 has cardinality 𝑘

𝑘

We show there exists 𝑉1,⋯ ,𝑉𝑂 𝑘 such that – 𝑉𝑖 ∖ 𝑉1 has size 𝑛 – 𝑉𝑖 ∖ 𝑉1 and 𝑉𝑗 ∖ 𝑉1 are disjoint for 𝑑 ≠ 𝑗

Page 56: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖 • 𝑃𝑖 has cardinality 𝑘

𝑘

We show there exists 𝑉1,⋯ ,𝑉𝑂 𝑘 such that – 𝑉𝑖 ∖ 𝑉1 has size 𝑛 – 𝑉𝑖 ∖ 𝑉1 and 𝑉𝑗 ∖ 𝑉1 are disjoint for 𝑑 ≠ 𝑗

• Using 𝑛 sets everything in ∪ 𝑉𝑖 can be covered,

that is 𝑂(𝑛3/2) elements.

Page 57: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Proof Idea Build a set of instances 𝑃1,⋯ ,𝑃𝑂 𝑘 let 𝑈 = {1,⋯ ,𝑂 𝑛4 } • Let 𝑉𝑖 be subset of size 𝑛 of 𝑈 • 𝑃𝑖 is a collection of subsets of size 𝑛 from 𝑉𝑖 • 𝑃𝑖 has cardinality 𝑘

𝑘

We show there exists 𝑉1,⋯ ,𝑉𝑂 𝑘 such that – 𝑉𝑖 ∖ 𝑉1 has size 𝑛 – 𝑉𝑖 ∖ 𝑉1 and 𝑉𝑗 ∖ 𝑉1 are disjoint for 𝑑 ≠ 𝑗

• Using 𝑛 sets everything in ∪ 𝑉𝑖 can be covered,

that is 𝑂(𝑛3/2) elements. • Using core-sets only 𝑉1 + 𝑛 log 𝑛 = O(k log k )

can be covered

Page 58: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets

Page 59: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures

Page 60: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures • We showed non existence of core-sets of polynomial

size in 𝑛 for maximum coverage

Page 61: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures • We showed non existence of core-sets of polynomial

size in 𝑛 for maximum coverage • Open Problems

– Are there any other applications of composable core-sets?

Page 62: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures • We showed non existence of core-sets of polynomial

size in 𝑛 for maximum coverage • Open Problems

– Are there any other applications of composable core-sets? – Is there a general characterization of measures for which

composable core-sets exist?

Page 63: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Conclusion

• Applications of composable core-sets • We showed construction of composable core-sets for a

wide range of diversity measures • We showed non existence of core-sets of polynomial

size in 𝑛 for maximum coverage • Open Problems

– Are there any other applications of composable core-sets? – Is there a general characterization of measures for which

composable core-sets exist? – Better approximation factors?

Page 64: Composable Core-sets for Diversity and Coverage Maximizationmahabadi/slides/Coreset.pdf · Composable Core-sets for Diversity and Coverage Maximization Piotr Indyk (MIT) Sepideh Mahabadi

Thank You!

Questions?