submodular function optimization

84
Submodular Function Optimization http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Andreas Krause’s, Prof. Jeff Bilmes, Prof. Francis Bach and Prof. Shaddin Dughmi lecture notes 1/84

Upload: others

Post on 29-Oct-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Submodular Function Optimization

Submodular Function Optimization

http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html

Acknowledgement: this slides is based on Prof. Andreas Krause’s, Prof. Jeff Bilmes, Prof.Francis Bach and Prof. Shaddin Dughmi lecture notes

1/84

Page 2: Submodular Function Optimization

2/84

Outline

1 What is submodularity?Examples in recommendation setsDefinition

2 Submodular maximization

3 Submodular minimization

4 Applications of submodular maximization

Page 3: Submodular Function Optimization

3/84

Outline

1 What is submodularity?Examples in recommendation setsDefinition

2 Submodular maximization

3 Submodular minimization

4 Applications of submodular maximization

Page 4: Submodular Function Optimization

4/84

Case study: News article recommendation

Page 5: Submodular Function Optimization

5/84

Case study: News article recommendation

Page 6: Submodular Function Optimization

6/84

Interactive recommendation

Number of recommendations k to choose from largeSimilar articles→ similar click-through rates!

Performance depends on query / contextSimilar users→ similar click-through rates!

Need to compile sets of k recommendations. (instead of onlyone)

Similar sets→ similar click-through rates!

Page 7: Submodular Function Optimization

7/84

News recommendation

Which set of articles satisfies most users?

Page 8: Submodular Function Optimization

8/84

News recommendation

Which set of articles satisfies most users?

Page 9: Submodular Function Optimization

9/84

Sponsored search

Which set of ads should be displayed to maximize revenue?

Page 10: Submodular Function Optimization

10/84

Sponsored search

Which set of ads should be displayed to maximize revenue?

Page 11: Submodular Function Optimization

11/84

Relevance vs. Diversity

Users may have different interests /queries may be ambiguous

E.g., "jaguar", "squash",· · ·

Want to choose a set that is relevant toas many users as possible

Users may choose from the set thearticle they’re most interested in

Want to optimize both relevance anddiversity

Page 12: Submodular Function Optimization

12/84

Simple abstract model

Suppose we’re given a set W of users and a collection V ofarticles/adsEach article i is relevant to a set of users Si

For now suppose this is known!

For each set of articles define

F(A) = | ∪i∈A Si|

Want to select k articles to maximize "users covered"

max|A|<k

F(A)

Number of sets A grows exponential in k!Finding optimal A is NP-hard

Page 13: Submodular Function Optimization

13/84

Maximum coverage

Given: Collection V of sets, utility function F(A)

Want: A∗ ⊆ V such that

A∗ = argmax|A|≤kF(A)

NP-hard!

Greedy algorithm:Start with A0 = {}For i=1 to k

s∗ = argmaxsF(Ai−1∪{s})Ai = Ai−1 ∪ {s∗}

��

��

How well doesthis simpleheuristic do?

Page 14: Submodular Function Optimization

14/84

Approximation guarantee

TheoremUnder some natural conditions,greedy algorithm produces a solutionA, where F(A) ≥ (1− 1/e) * optimal-value (∼ 63%).[Nemhauser,Fisher,Wolsey’78]

This result holds for utility functions F with 2 properties:F is (nonnegative) monotone:if A ⊆ B then 0 ≤ F(A) ≤ F(B)F is submodular:"diminishing returns"

Page 15: Submodular Function Optimization

15/84

Outline

1 What is submodularity?Examples in recommendation setsDefinition

2 Submodular maximization

3 Submodular minimization

4 Applications of submodular maximization

Page 16: Submodular Function Optimization

16/84

Set Functions

Ground set: subsets of some finite set

Given a set X, the set V := 2X = {A | A ⊆ X}

A set function takes as input a set, and outputs a real numberInputs are subsets of some ground set XF : 2X → R

It is common in the literature to use either X or V as the groundset.

We will follow this inconsistency in the literature and willinconsistently use either X or V as our ground set (hopefully notin the same equation, if so, please point this out).

Page 17: Submodular Function Optimization

17/84

Set Functions

If F is a modular function, then for any A,B ⊆ V, we have

F(A) + F(B) = F(A ∩ B) + F(A ∪ B)

If F is a modular function, it may be written as

F(A) = F(∅) +∑a∈A

(F({a})− F(∅))

modular set functionsAssociate a weight wi with each i ∈ X, and set F(S) =

∑i∈S wi

Discrete analogue of linear functions

Other possibly useful properties a set function may have:Monotone: if A ⊆ B ⊆ X, then F(A) ≤ F(B)Nonnegative: F(A) ≥ 0 for all S ⊆ XNormalized: F(∅) = 0.

Page 18: Submodular Function Optimization

18/84

Submodular Functions

Definition 1A set function F : V → R is submodular if and only if

F(A) + F(B) ≥ F(A ∩ B) + F(A ∪ B)

for all A,B ⊆ V.

“Uncrossing” two sets reduces theirtotal function value

DefinitionA set function F : V → R is supmodular if and only if −F issubmodular.

Page 19: Submodular Function Optimization

19/84

Submodular Functions

Definition 2 (diminishing returns)A set function F : V → R is submodular if and only if

F(B ∪ {s})− F(B)︸ ︷︷ ︸Gain of adding a set s to a large solution

≤ F(A ∪ {s})− F(A)︸ ︷︷ ︸Gain of adding a set s to a small solution

for all A ⊆ B ⊆ V and s /∈ B.

The marginal value of an additional elementexhibits “diminishing marginal returns”This means that the incremental “value”,“gain”, or “cost” of s decreases (diminishes)as the context in which s is consideredgrows from A to B.

Page 20: Submodular Function Optimization

20/84

Submodular: Consumer Costs of Living

Consumer costs are very often submodular. For example:

When seen as diminishing returns:

Page 21: Submodular Function Optimization

21/84

Submodular Functions

Definition 3 (group diminishing returns)A set function F : V → R is submodular if and only if

F(B ∪ C)− F(B) ≤ F(A ∪ C)− F(A)

for all A ⊆ B ⊆ V and C ⊆ V\B.

This means that the incremental “value”, “gain”, or “cost” of set Cdecreases (diminishes) as the context in which C is consideredgrows from A to B.

Page 22: Submodular Function Optimization

22/84

Equivalence of Definitions

Definition 2 =⇒ Definition 3Let C = {c1, . . . , ck}. The Definition 2 implies

F(A ∪ C)− F(A)

= F(A ∪ C)−k−1∑i=1

(F(A ∪ {c1, . . . , ci})− F(A ∪ {c1, . . . , ci}))− F(A)

=

k∑i=1

(F(A ∪ {c1, . . . , ci})− F(A ∪ {c1, . . . , ci−1}))

≥k∑

i=1

(F(B ∪ {c1, . . . , ci})− F(B ∪ {c1, . . . , ci−1}))

= F(B ∪ C)− F(B)

Page 23: Submodular Function Optimization

23/84

Equivalence of Definitions

Definition 1 =⇒ Definition 2To prove (2), let A′ = A ∪ {i} and B′ = B and apply (1)

F(A ∪ {i}) + F(B) = F(A′) + F(B′)

≥ F(A′ ∩ B′) + F(A′ ∪ B′)

= F(A) + F(B ∪ {i})

Definition 2 =⇒ Definition 1Assume A 6= B. Define A′ = A ∩ B, C = A\B and B′ = B. Then

F(A′ ∪ C)− F(A′) ≥ F(B′ ∪ C)− F(B′)

⇐⇒ F((A ∩ B) ∪ (A\B)) + F(B) ≥ F(B ∪ (A\B)) + F(A′)

⇐⇒ F(A) + F(B) ≥ F(A ∪ B) + F(A ∩ B)

Page 24: Submodular Function Optimization

24/84

Submodularity

Submodular functions have a long history in economics, gametheory, combinatorial optimization, electrical networks, andoperations research.

They are gaining importance in machine learning as well

Arbitrary set functions are hopelessly difficult to optimize, whilethe minimum of submodular functions can be found in polynomialtime, and the maximum can be constant-factor approximated inlow-order polynomial time.

Submodular functions share properties in common with bothconvex and concave functions.

Page 25: Submodular Function Optimization

25/84

Example: Set cover

F is submodular: A ⊆ B

F(A ∪ {s})− F(A)︸ ︷︷ ︸Gain of adding a set s to a small solution

≥ F(B ∪ {s})− F(B)︸ ︷︷ ︸Gain of adding a set s to a large solution

Natural example:Set S1, S2, · · · , Sn

F(A)=size of union of Si

(e.g.„number of satisfied users)

F(A) = |∪i∈ASi|

Page 26: Submodular Function Optimization

26/84

Closedness properties

F1, · · · ,Fm submodular functions on V and λ1, · · · , λm ≥ 0

Then: F(A) =∑

i λiFi(A) is submodular!

Submodularity closed under nonnegative linear combinationsExtremely useful fact:

Fθ(A) submodular⇒∑θ P(θ)Fθ(A) submodular!

Multi-objective optimization:F1, · · · ,Fm submodular, λi > 0⇒

∑i λiFi(A) submodular

Page 27: Submodular Function Optimization

27/84

Probabilistic set cover

Document coverage function:coverd(c)=probability document d covers concept c, e.g., howstrongly d covers cIt can model how relevant is concept c for user u

Set coverage function:

coverA(c) = 1−Πd∈A(1− coverd(c))

Probability that at least one document in A covers c

Objective:max|A|≤k

F(A) =∑

c

wccoverA(c)

wc is the concept weights

The objective function is submodular

Page 28: Submodular Function Optimization

28/84

A model of Influence in Social Networks

Given a graph G = (V,E), each v ∈ V corresponds to a person,to each v we have an activation function fv : 2V → [0, 1]dependent only on its neighbors. i.e., fv(A) = fv(A ∩ Γ(v)).

Goal - Viral Marketing: find a small subset S ⊆ V of individuals todirectly influence, and thus indirectly influence the greatestnumber of possible other individuals (via the social network G).

We define a function f : 2V → Z+ that models the ultimateinfluence of an initial set S of nodes based on the followingiterative process: At each step, a given set of nodes S areactivated, and we activate new nodes v ∈ V\S if fv(S) ≥ U[0; 1](where U[0; 1] is a uniform random number between 0 and 1).

It can be shown that for many fv (including simple linearfunctions, and where fv is submodular itself) that f is submodular.

Page 29: Submodular Function Optimization

29/84

Example: Influence in social networks [Kempe,Kleinberg, Tardos KDD’03]

Which nodes are most influential?V = Alice,Bob,Charlie,Dorothy,Eric,FionaF(A) = Expected number of people influenced by set A

Page 30: Submodular Function Optimization

30/84

Influence in social networks is submodular [Kempe,Kleinberg, Tardos KDD’03]

Key idea: Flip coins c in advance→ "live" edgesFc(A) = People influenced under outcome cF(A) =

∑c P(c)Fc(A) is submodular as well!

Page 31: Submodular Function Optimization

31/84

The value of a friend

Let V be a group of individuals. How valuable to you is a givenfriend v ∈ V ?

It depends on how many friends you have.

Given a group of friends S ⊆ V , can you valuate them with afunction F(S) and how?

Let F(S) be the value of the set of friends S. Is submodular orsupermodular a good model?

Page 32: Submodular Function Optimization

32/84

Information and Summarization

Let V be a set of information containing elements (V might saybe either words, sentences, documents, web pages, or blogs,each v ∈ V is one element, so v might be a word, a sentence, adocument, etc.). The total amount of information in V is measureby a function F(V), and any given subset S ⊆ V measures theamount of information in S, given by F(S).

How informative is any given item v in different sized contexts?Any such real-world information function would exhibitdiminishing returns, i.e., the value of v decreases when it isconsidered in a larger context.

So a submodular function would likely be a good model.

Page 33: Submodular Function Optimization

33/84

Restriction

RestrictionIf F(S) is submodular on V and W ⊆ V. Then F′(S) = F(S ∩W) issubmodular

Proof: Given A ⊆ B ⊆ V\{i}, prove:

F((A ∪ {i}) ∩W)− F(A ∩W) ≥ F((B ∪ {i}) ∩W)− F(B ∩W).

If i /∈ W, then both differences on each size are zero. Suppose thati ∈ W, then (A ∪ {i}) ∩W = (A ∩W) ∪ {i} and(B ∪ {i}) ∩W = (B ∩W) ∪ {i}. We have A ∩W ⊆ B ∩W, thesubmodularity of F yields

F((A ∩W) ∪ {i})− F(A ∩W) ≥ F((B ∩W) ∪ {i})− F(B ∩W).

Page 34: Submodular Function Optimization

34/84

Conditioning

ConditioningIf F(S) is submodular on V and W ⊆ V. Then F′(S) = F(S ∪W) issubmodular

Page 35: Submodular Function Optimization

35/84

Reflection

ReflectionIf F(S) is submodular on V. Then F′(S) = F(V \ S) is submodular

Proof: Since V\(A ∪ B) = (V\A) ∩ (V\B) andV\(A ∩ B) = (V\A) ∪ (V\B), then

F(V\A) + F(V\B) ≥ F(V\(A ∪ B)) + F(V\(A ∩ B)))

Page 36: Submodular Function Optimization

36/84

Convex aspects

Submodularity as discrete analogue of convexity

Convex extension

Duality

Polynomial time minimization!

A∗ = arg minA⊆V

F(A)

Many applications (computer vision,ML, · · · )

Page 37: Submodular Function Optimization

37/84

Concave aspects

Marginal gain 4F(s|A) = F({s} ∪ A)− F(A)

Submodular:

∀A ⊆ B, s 6∈ B : F(A ∪ {s})− F(A) ≥ F(B ∪ {s})− F(B)

Concave:

∀a ≤ b, s > 0 g(a + s)− g(a) ≥ g(b + s)− g(b)

Page 38: Submodular Function Optimization

38/84

∀a ≤ b, s > 0 g(a + s)− g(a) ≥ g(b + s)− g(b)

Suppose that a + s ∈ [a, b]

Apply the concavity of g(x) to [a, a + s, b + s]:

g(a + s) ≥ b− ab + s− a

g(a) +s

b + s− ag(b + s)

⇐⇒ g(a + s)− g(a) ≥ −sb + s− a

g(a) +s

b + s− ag(b + s)

Apply the concavity of g(x) to [a + s, b, b + s]:

g(b) ≥ sb + s− a

g(a) +b− a

b + s− ag(b + s)

⇐⇒ g(b + s)− g(b) ≤ −sb + s− a

g(a) +s

b + s− ag(b + s)

Page 39: Submodular Function Optimization

39/84

Submodularity and Concavity

Let m ∈ RX+ be a modular function, and g a concave function over R.

Define F(A) = g(m(A)). Then F(A) is submodular.

Proof: Given A ⊆ B ⊆ X\v, we have 0 ≤ a = m(A) ≤ b = m(B), and0 ≤ s = m(v). For g concave, we haveg(a + s)− g(a) ≥ g(b + s)− g(b), which implies

g(m(A) + m(v))− g(m(A)) ≥ g(m(B) + m(v))− g(m(B))

Page 40: Submodular Function Optimization

40/84

Maximum of submodular functions

Suppose F1(A) and F2(A) submodular.Is F(A) = max(F1(A),F2(A)) submodular?

max(F1,F2) not submodular in general!

Page 41: Submodular Function Optimization

41/84

Minimum of submodular functions

Well,maybe F(A) = min(F1(A),F2(A)) instead?

max(F1,F2) not submodular in general!

Page 42: Submodular Function Optimization

42/84

Max - normalized

Given V, let c ∈ RV+ be a given fixed vector. Then F : 2V → R+, where

F(A) = maxj∈A

cj

is submodular and normalized (we take F(∅) = 0).Proof: Since

max(maxj∈A

cj,maxj∈B

cj) = maxj∈A∪B

cj

andmin(max

j∈Acj,max

j∈Bcj) ≥ max

j∈A∩Bcj,

we havemaxj∈A

cj + maxj∈B

cj ≥ maxj∈A∪B

cj + maxj∈A∩B

cj

Page 43: Submodular Function Optimization

43/84

Monotone difference of two functions

Let F and G both be submodular functions on subsets of V and let(F −G)(·) be either monotone increasing. Then h : 2V → R defined byh(A) = min(F(A),G(A)) is submodular.

If h(A) agrees with either f or g on both X and Y , the resultfollows since

F(X) + F(Y)G(X) + G(Y)

≥ min(F(X ∪ Y),G(X ∪ Y)) + min(F(X ∩ Y),G(X ∩ Y))

otherwise, w.l.o.g., h(X) = F(X) and h(Y) = G(Y), giving

h(X) + h(Y) = F(X) + G(Y) ≥ F(X ∪ Y) + F(X ∩ Y) + G(Y)− F(Y)

Assume F − G is monotonic increasing. Hence,F(X ∪ Y) + G(Y)− F(Y) ≥ G(X ∪ Y) giving

h(X) + h(Y) ≥ G(X ∪ Y) + F(X ∩ Y) ≥ h(X ∪ Y) + h(X ∩ Y)

Page 44: Submodular Function Optimization

44/84

Min

Let F : 2V → R be an increasing or decreasing submodularfunction and let k be a constant. Then the function h : 2V → Rdefined by

h(A) = min(k; F(A))

is submodular

In general, the minimum of two submodular functions is notsubmodular. However, when wishing to maximize two monotonenon-decreasing submodular functions, we can define functionh : 2V → R as

h(A) =12

(min(k,F) + min(k,G))

then h is submodular, and h(A) ≥ k if and only if both F(A) ≥ kand G(A) ≥ k

Page 45: Submodular Function Optimization

45/84

Outline

1 What is submodularity?Examples in recommendation setsDefinition

2 Submodular maximization

3 Submodular minimization

4 Applications of submodular maximization

Page 46: Submodular Function Optimization

46/84

Submodular maximization a Cardinality Constraint

Problem DefinitionGiven a non-decreasing and normalized submodular functionF : 2X → R+ on a finite ground set X with |X| = n, and an integerk ≤ n:

max F(A), s.t. |A| ≤ k

Greedy AlgorithmI A0 ← ∅, set i = 0

I While |Ai| ≤ kChoose s ∈ X maximizing F(Ai ∪ {s})Ai+1 ← Ai ∪ {s}

Page 47: Submodular Function Optimization

47/84

Greedy maximization is near-optimal

Theorem[Nemhauser, Fisher& Wolsey’78]For monotonic submodular functions, Greedy algorithm givesconstant factor approximation

F(Agreedy) ≥ (1− 1/e)︸ ︷︷ ︸∼63%

F(A∗)

Greedy algorithm gives near-optimal solution!For many submodular objectives: Guarantees best possibleunless P=NPCan also handle more complex constraints

Page 48: Submodular Function Optimization

48/84

Contraction/Conditioning

Let F : 2X → R and A ⊆ X. Define FA(S) = F(A ∪ S)− F(A).Lemma: If F is monotone and submodular, then FA is monotone,submodular, and normalized for any A.

Proof: Monotone:Let S ⊆ T, then FA(S) = F(A∪ S)−F(A) ≤ F(A∪T)−F(A) = FA(T)

Submodular. Let S,T ⊆ X:

FA(S) + FA(T) = F(S ∪ A)− F(A) + F(T ∪ A)− F(A)

≥ F(S ∪ T ∪ A)− F(A) + F((S ∩ T) ∪ A)− F(A)

= FA(S ∪ T) + FA(S ∩ T)

Page 49: Submodular Function Optimization

49/84

LemmaIf F is normalized and submodular, and A ⊆ X, then there is j ∈ Asuch that F({j}) ≥ 1

|A|F(A)

Proof. If A1 and A2 partition A, i.e., A = A1 ∪ A2 and A1 ∩ A2 = ∅,then

F(A1) + F(A2) ≥ F(A1 ∪ A2) + F(A1 ∩ A2) = F(A)

Applying recursively, we get∑j∈A

F({j}) ≥ F(A)

Therefore, maxj∈A F({j}) ≥ 1|A|F(A)

Page 50: Submodular Function Optimization

50/84

Greedy maximization is near-optimal

Theorem[Nemhauser, Fisher& Wolsey’78]For monotonic submodular functions, Greedy algorithm givesconstant factor approximation

F(Agreedy) ≥ (1− 1/e)F(A∗)

Proof: Let Ai be the working set in the algorithmLet A∗ be optimal solution.We will show that the suboptimality F(A∗)− F(A) shrinks by afactor of (1− 1/k) each iterationAfter k iterations, it has shrunk to (1− 1/k)k ≤ 1/e from itsoriginal valueThe algorithm choose s ∈ X maximizing F(Ai ∪ {s}). Hence:

F(Ai+1) = F(Ai) + F(Ai ∪ {s})− F(Ai) = F(Ai) + maxj

FAi({j})

Page 51: Submodular Function Optimization

51/84

By our lemmas, there is j ∈ A∗ s.t.

FAi({j}) ≥1|A∗|

FAi(A∗) (apply lemma to FAi)

=1k

(F(Ai ∪ A∗)− F(Ai))

≥ 1k

(F(A∗)− F(Ai))

Therefore

F(A∗)− F(Ai+1) = F(A∗)− F(Ai)−maxj

FAi({j})

≤(

1− 1k

)(F(A∗)− F(Ai))

≤(

1− 1k

)k

(F(A∗)− F(∅))

Page 52: Submodular Function Optimization

52/84

Scaling up the greedy algorithm [Minoux’78]

In round i+1,have picked Ai = s1, · · · , si

pick si+1 = arg maxs F(Ai ∪ {s})− F(Ai)

i.e., maximize "marginal benefit" 4(s|Ai)

4(s|Ai) = F(Ai ∪ {s})− F(Ai)

Key observation: Submodularity implies

Marginal benefits can never increase!

Page 53: Submodular Function Optimization

53/84

"Lazy" greedy algorithm [Minoux’78]

Lazy greedy algorithm:First iteration as usualKeep an ordered list of marginalbenefits 4i from previous iterationRe-evaluate 4i only for topelementIf 4i stays on top, use it,otherwise re-sort

Note: Very easy to compute online bounds, lazy evaluations, etc.[Leskovec,Krause et al.’07]

Page 54: Submodular Function Optimization

54/84

Empirical improvements [Leskovec, Krause et al’06]

Page 55: Submodular Function Optimization

55/84

Outline

1 What is submodularity?Examples in recommendation setsDefinition

2 Submodular maximization

3 Submodular minimization

4 Applications of submodular maximization

Page 56: Submodular Function Optimization

56/84

Optimizing Submodular Functions

As our examples suggest, optimization problems involvingsubmodular functions are very common

These can be classified on two axes: constrained/unconstrainedand maximization/minimization

Maximization MinimizationUnconstrained NP-hard

12 approximation Polynomial time

via convex optConstrained Usually NP-hard

1 − 1/e (mono, matroid)O(1) (“nice” constriants)

Usually NP-hard to apx.Few easy special cases

RepresentationIn order to generalize all our examples, algorithmic results are oftenposed in the value oracle model. Namely, we only assume we haveaccess to a subroutine evaluating F(S).

Page 57: Submodular Function Optimization

57/84

Problem DefinitionGiven a submodular function f : 2X → R on a finite ground set X,

min F(S)

s.t. S ⊆ X

We denote n = |X|

We assume F(S) is a rational number with at most b bits

Representation: in order to generalize all our examples,algorithmic results are often posed in the value oracle model.Namely, we only assume we have access to a subroutineevaluating F(S) in constant time.

GoalAn algorithm which runs in time polynomial in n and b.

Page 58: Submodular Function Optimization

58/84

Some more notations

E = {1, 2, . . . , n}

RE = {x = (xj ∈ R : j ∈ E)}

RE+ = {x = (xj ∈ R : j ∈ E) : x ≥ 0}

Any vector x ∈ RE can be treated as a normalized modularfunction, and vice verse. That is

x(A) =∑a∈A

xa.

Note that x is said to be normalized since x(∅) = 0.

Given A ⊆ E, define the vector 1A ∈ RE+ to be

1A(j) =

{1 if j ∈ A0 if j /∈ A

given modular function x ∈ RE, we can write x(A) in a variety ofways, i.e., x(A) = x · 1A =

∑i∈A xi

Page 59: Submodular Function Optimization

59/84

Continuous Extensions of a Set Function

A set function F on X = {1, . . . , n} can be thought of as a mapfrom the vertices {0, 1}n of the n-dimensional hypercube to thereal numbers.

Extension of a Set FunctionGiven a set function F : {0, 1}n → R, an extension of F to thehypercube [0, 1]n is a function g : [0, 1]n → R satisfying g(x) = F(x) forevery x ∈ {0, 1}n.

minw∈{0,1}n

F(w)

with ∀A ⊆ X, F(1A) = F(A)

Page 60: Submodular Function Optimization

60/84

Choquet integral - Lovász extension

Subsets may be identified with elements of {0, 1}n

Given any set-function F and w such that wj1 ≥ . . . ≥ wjn , define

f (w) =

n∑k=1

wjk [F({j1, . . . , jk})− F({j1, . . . , jk−1})

=

n−1∑k=1

(wjk − wjk+1)F({j1, . . . , jk}) + wjnF({j1, . . . , jn})

If w = 1A, f (w) = F(A) =⇒ extension from {0, 1}n to Rn

Page 61: Submodular Function Optimization

61/84

Choquet integral - Lovász extension, example: p = 2

If w1 ≥ w2, f (w) = F({1})w1 + [F({1, 2})− F({1})]w2

If w1 ≤ w2, f (w) = F({2})w2 + [F({1, 2})− F({2})]w1

level set {w ∈ R2, f (w) = 1} is displayed in blue

Compact formulation: f (w) =[F({1, 2})− F({1})− F({2})] min(w1,w2) + F({1})w1 + F({2})w2

Page 62: Submodular Function Optimization

62/84

Links with convexity

Theorem (Lovász, 1982)F is submodular if and only if f is convex

Proof requires: Submodular and base polyhedra

Submodular polyhedron: P(F) = {s ∈ Rn, ∀A ⊆ V, s(A) ≤ F(A)}

Base polyhedron: B(F) = P(F) ∩ {s(V) = F(V)}

Page 63: Submodular Function Optimization

63/84

Submodular and base polyhedra

P(F) has non-empty interiorMany facets (up to 2n), many extreme points (up to n!)

Fundamental property (Edmonds, 1970): If F is submodular,maximizing linear functions may be done by a “greedy algorithm”

Let w ∈ Rn+ such that wj1 ≥ . . . ≥ wjn

Let sjk = F({j1, . . . , jk})− F({j1, . . . , jk−1}) for k ∈ {1, . . . , n}

Thenf (w) = max

s∈P(F)w>s = max

s∈B(F)w>s

Both problems attained at s defined as above.

proofs: pages 41-44 in http://bicmr.pku.edu.cn/~wenzw/bigdata/submodular_fbach_mlss2012.pdf

Page 64: Submodular Function Optimization

64/84

Links with convexity

Theorem (Lovász, 1982)F is submodular if and only if f is convex

If F is submodular, f is the maximum of linear functions. Then f isconvex

If f is convex, let A,B ⊆ V1A∪B + 1A∩B = 1A + 1B has components equal to 0 (on V\(A ∪ B)),2 (on A ∩ B) and 1 (on A∆B = (A\B) ∪ (B\A))

Thus f (1A∪B + 1A∩B) = F(A ∪ B) + F(A ∩ B). Proof by writing outf (1A∪B + 1A∩B) and the definition of f (w).

By homogeneity and convexity, f (1A + 1B) ≤ f (1A) + f (1B), which isequal to F(A) + F(B), and thus F is submodular.

Page 65: Submodular Function Optimization

65/84

Links with convexity

Theorem (Lovász, 1982)If F is submodular, then

minA⊆V

F(A) = minw∈{0,1}n

f (w) = minw∈[0,1]n

f (w)

Since f is an extension of F,

minA⊆V

F(A) = minw∈{0,1}n

f (w) ≥ minw∈[0,1]n

f (w)

Any w ∈ [0, 1]n can be decomposed as w =∑m

i=1 λi1Bi , whereB1 ⊆ . . . ⊆ Bm = V, where λ ≥ 0 and λ(V) ≤ 1:

Since minA⊆V F(A) ≤ 0 (F(∅) = 0),

f (w) =

m∑i=1

λiF(Bi) ≥m∑

i=1

λi minA⊆V

F(A) ≥ minA⊆V

F(A)

Thus minw∈[0,1]n f (w) ≥ minA⊆V F(A).

Page 66: Submodular Function Optimization

66/84

Links with convexity

Any w ∈ [0, 1]n, sort wj1 ≥ . . . ≥ wjn . Find λ such that

n∑k=1

λjk = wj1 ,

n∑k=2

λjk = wj2 , . . . , λjn = wjn ,

B1 = {j1},B2 = {j1, j2}, . . . ,Bn = {j1, j2, . . . , jn}

Then we have w =∑n

i=1 λi1Bi , where B1 ⊆ . . . ⊆ Bn = V, whereλ ≥ 0 and λ(V) =

∑i∈V λi ≤ 1.

Page 67: Submodular Function Optimization

67/84

Submodular function minimization

Let F : 2V → R be a submodular function (such that F(∅) = 0)

convex duality:

minA⊆V

F(A) = minw∈[0,1]n

f (w)

= minw∈[0,1]n

maxs∈B(F)

w>s

= maxs∈B(F)

minw∈[0,1]n

w>s = maxs∈B(F)

s−(V)

Page 68: Submodular Function Optimization

68/84

Submodular function minimization

Convex optimizationIf F is submodular, then

minA⊆V

F(A) = minw∈{0,1}n

f (w) = minw∈[0,1]n

f (w)

Using projected subgradient descent to minimize f on [0, 1]n

Iteration: wt = Π[0,1]n(wt−1 − C√t st), where st ∈ ∂f (wt−1)

f (w) = maxs∈B(F) w>s

Standard convergence results from convex optimization

f (wt)− minw∈[0,1]n

f (w) ≤ C√t

Page 69: Submodular Function Optimization

69/84

Outline

1 What is submodularity?Examples in recommendation setsDefinition

2 Submodular maximization

3 Submodular minimization

4 Applications of submodular maximization

Page 70: Submodular Function Optimization

70/84

Question

I have 10 minutes.Which blogs should Iread to be most up to date?[Leskovec, Krause, Guestrin, Faloutsos,VanBriesen, Glance’07]

Page 71: Submodular Function Optimization

71/84

Detecting Cascades in the Blogosphere

Which blogs should we read to learn about big cascades early?

Page 72: Submodular Function Optimization

72/84

Reward function is submodular

Consider cascade i:Fi(sk) = benefit from blog sk inevent iFi(A) = max Fi(sk), sk ∈ A⇒ Fi is submodular

Overall objective:

F(A) =1m

m∑i=1

Fi(A)

⇒ F is submodular

⇒ Can use greedy algorithm to solve max|A|≤k F(A) !

Page 73: Submodular Function Optimization

73/84

Performance on Blog selection

Submodular formulation outperforms heuristicsLazy greedy gives 700x speedup

Page 74: Submodular Function Optimization

74/84

Application: Network inference

How can we learn who influences whom?

Page 75: Submodular Function Optimization

75/84

Inferring diffusion networks[Gomez Rodriguez, Leskovec, Krause ACM TKDE 2012]

Given traces of influence, wish to infer sparsedirected network G = (V,E)⇒ Formulate as optimization problem

E∗ = arg max|E|≤k

F(E)

Page 76: Submodular Function Optimization

76/84

Estimation problem

Many influence trees T consistent with dataFor cascade Ci , model P(Ci|T)

Find sparse graph that maximizes likelihood for all observedcascades

⇒ Log likelihood monotonic submodular in selected edges

F(E) =∑

i

log maxtree T⊆E

P(Ci|T)

Page 77: Submodular Function Optimization

77/84

Evaluation: Synthetic networks

Performance does not depend on the network structure:Synthetic Networks: Forest Fire, Kronecker, etc.Transmission time distribution: Exponential, Power Law

Break-even point of > 90%

Page 78: Submodular Function Optimization

78/84

Inferred Diffusion Network[Gomez Rodriguez, Leskovec, Krause ACM TKDE 2012]

Actual network inferred from 172 million articles from 1 million newssources

Page 79: Submodular Function Optimization

79/84

Diffusion Network (small part)

Page 80: Submodular Function Optimization

80/84

Application: Document summarization[Lin & Bilmes’11]

Which sentences should we select that best summarize adocument?

Page 81: Submodular Function Optimization

81/84

Marginal gain of a sentence

Many natural notions of “document coverage” are submodular[Lin & Bilmes’11]

Page 82: Submodular Function Optimization

82/84

Relevance of a summary

F(S) = R(S) + λD(S)↑ ↑

Relevance Diversity

R(S) =∑

i

min{Ci(S), αCi(V)}

Ci(S) =∑j∈S

ωi,j

Ci(S): How well is sentence i "covered" by Sωi,j: Similarity between i and j

Page 83: Submodular Function Optimization

83/84

Diversity of a summary

D(S) =

K∑i=1

√ ∑j∈Pi∩S

rj

rj =1N

∑i

ωi,j

rj: Relevance of sentence j to doc.ωi,j: Similarity between i and j

Can be made query-specific; multi-resolution; etc.

Page 84: Submodular Function Optimization

84/84

Summary

Many problems of recommending sets can be cast assubmodular maximizationGreedy algorithm gives best set of size kCan use lazy evaluations to speed upApproximate submodular maximization possible under a varietyof constraints:

MatroidKnapsackMultiple matroid and knapsack constraintsPath constraints (Submodular orienteering)Connectedness (Submodular Steiner)Robustness (minimax)