learning submodular functions

46
Learning Submodular Functions Nick Harvey University of Waterloo Joint work with Nina Balcan, Georgia Tech

Upload: lewis

Post on 22-Mar-2016

59 views

Category:

Documents


2 download

DESCRIPTION

Learning Submodular Functions. Nick Harvey University of Waterloo Joint work with Nina Balcan , Georgia Tech. Submodular functions. V={1,2, …, n} f : 2 V ! R. Submodularity :. Concave Functions Let h : R ! R be concave. For each S µ V, let f(S) = h(|S|). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning  Submodular  Functions

Learning Submodular Functions

Nick HarveyUniversity of Waterloo

Joint work withNina Balcan, Georgia Tech

Page 2: Learning  Submodular  Functions

Submodular functionsV={1,2, …, n}

f : 2V ! R

• Concave Functions Let h : R ! R be concave.For each S µ V, let f(S) = h(|S|)

• Vector Spaces Let V={v1,,vn}, each vi 2 Rn.

For each S µ V, let f(S) = rank(V[S])

Examples:

f(S)+f(T) ¸ f(S Å T) + f(S [ T) 8 S,Tµ V

Decreasing marginal values:

f(S [ {x})-f(S) ¸ f(T [ {x})-f(T) 8SµTµV, xT

Submodularity:

Equivalent

Page 3: Learning  Submodular  Functions

Submodular functionsV={1,2, …, n}

f : 2V ! R

f(S) · f(T), 8 S µ T

f(S) ¸ 0, 8 S µ V

Non-negative:

Monotone:

f(S)+f(T) ¸ f(S Å T) + f(S [ T) 8 S,Tµ V

Decreasing marginal values:

f(S [ {x})-f(S) ¸ f(T [ {x})-f(T) 8SµTµV, xT

Submodularity:

Equivalent

Page 4: Learning  Submodular  Functions

Submodular functions

• Strong connection between optimization and submodularity• e.g.: minimization [C’85,GLS’87,IFF’01,S’00,…],

maximization [NWF’78,V’07,…]

• Much interest in Machine Learning community recently• Tutorials at major conferences: ICML, NIPS, etc.• www.submodularity.org is a Machine Learning site

• Algorithmic game theory• Submodular utility functions

• Interesting to understand their learnability

Page 5: Learning  Submodular  Functions

• Algorithm adaptively queries xi and receives value f(xi), for i=1,…,q, where q=poly(n).

• Algorithm produces “hypothesis” g. (Hopefully g ¼ f)

• Goal: g(x)·f(x)·®¢g(x) 8 x 2 {0,1}n

® as small as possible

f : {0,1}n R

Algorithm

f(x1)

g : {0,1}n R

x1

Exact Learning with value queriesGoemans, Harvey, Iwata, Mirrokni SODA 2009

Page 6: Learning  Submodular  Functions

• Algorithm adaptively queries xi and receives value f(xi), for i=1,…,q

• Algorithm produces “hypothesis” g. (Hopefully g ¼ f)

• Goal: g(x)·f(x)·®¢g(x) 8 x 2 {0,1}n

® as small as possible

Exact Learning with value queriesGoemans, Harvey, Iwata, Mirrokni SODA 2009

9 an alg. for learning a submodular functionwith ® = O(n1/2).

Theorem: (Upper bound)

~

Any alg. for learning a submodular functionmust have ® = (n1/2).

Theorem: (Lower bound)

~

Page 7: Learning  Submodular  Functions

Problems with this model • In learning theory, usually only try to predict value of

most points

• GHIM lower bound fails if goal is to do well on most of the points

• To define “most” need a distribution on {0,1}n

Is there a distributional modelfor learning submodular functions?

Page 8: Learning  Submodular  Functions

Distribution Don {0,1}n

Our Model

• Algorithm sees examples (x1,f(x1)),…, (xq,f(xq))where xi’s are i.i.d. from distribution D

• Algorithm produces “hypothesis” g. (Hopefully g ¼ f)

f : {0,1}n R+

Algorithmxi

f(xi) g : {0,1}n R+

Page 9: Learning  Submodular  Functions

Distribution Don {0,1}n

Our Model

• Algorithm sees examples (x1,f(x1)),…, (xq,f(xq))where xi’s are i.i.d. from distribution D

• Algorithm produces “hypothesis” g. (Hopefully g ¼ f)

• Prx1,…,xq[ Prx[g(x)·f(x)·®¢g(x)] ¸ 1-² ] ¸ 1-±• “Probably Mostly Approximately Correct”

f : {0,1}n R+

Algorithmx

g : {0,1}n R+Is f(x) ¼ g(x)?

Page 10: Learning  Submodular  Functions

Distribution Don {0,1}n

Our Model

• “Probably Mostly Approximately Correct”• Impossible if f arbitrary and # training points ¿ 2n

• Possible if f is a non-negative, monotone, submodular function

f : {0,1}n R+

Algorithmx

g : {0,1}n R+Is f(x) ¼ g(x)?

Page 11: Learning  Submodular  Functions

Example: Concave Functions

• Concave Functions Let h : R ! R be concave.

h

Page 12: Learning  Submodular  Functions

;

V

Example: Concave Functions

• Concave Functions Let h : R ! R be concave.For each SµV, let f(S) = h(|S|).

• Claim: f is submodular.• We prove a partial converse.

Page 13: Learning  Submodular  Functions

Theorem: Every submodular function looks like this.Lots of approximately

usually.

;

V

Page 14: Learning  Submodular  Functions

Theorem: Every submodular function looks like this.Lots of approximately

usually.

Theorem:Let f be a non-negative, monotone, submodular, 1-Lipschitz function.There exists a concave function h : [0,n] ! R s.t., for any ²>0, for every k2{0,..,n}, and for a 1-² fraction of SµV with |S|=k,we have:

In fact, h(k) is just E[ f(S) ], where S is uniform on sets of size k.Proof: Based on Talagrand’s Inequality.

h(k) · f(S) · O(log2(1/²))¢h(k).

;

V

matroid rank function

Page 15: Learning  Submodular  Functions

Learning Submodular Functionsunder any product distribution

Product DistributionD on {0,1}n

f : {0,1}n R+

Algorithmxi

f(xi) g : {0,1}n R+

• Algorithm: Let ¹ = §i=1 f(xi) / q• Let g be the constant function with value ¹• This achieves approximation factor O(log2(1/²)) on a

1-² fraction of points, with high probability.• Proof: Essentially follows from previous theorem.

q

Page 16: Learning  Submodular  Functions

Learning Submodular Functionsunder an arbitrary distribution?

• Same argument no longer works.Talagrand’s inequality requires a product distribution.

• Intuition:A non-uniform distribution focuses on fewer points,so the function is less concentrated on those points.

;

V

Page 17: Learning  Submodular  Functions

A General Upper Bound?• Theorem: (Our upper bound)

9 an algorithm for learning a submodular function w.r.t. an arbitrary distribution that has approximation factor O(n1/2).

Page 18: Learning  Submodular  Functions

Computing Linear Separators+

– +

+

+

+–

– +

– +

+

– – • Given {+,–}-labeled points in Rn, find a hyperplane cTx

= b that separates the +s and –s.• Easily solved by linear programming.

Page 19: Learning  Submodular  Functions

Learning Linear Separators+

– +

+

+

+–

– +

– +

+

– – • Given random sample of {+,–}-labeled points in Rn,

find a hyperplane cTx = b that separates most ofthe +s and –s.

• Classic machine learning problem.

Error!

Page 20: Learning  Submodular  Functions

Learning Linear Separators+

– +

+

+

+–

– +

– +

+

– – • Classic Theorem: [Vapnik-Chervonenkis 1971?]

O( n/²2 ) samples suffice to get error ².

Error!

~

Page 21: Learning  Submodular  Functions

Submodular Functions are Approximately Linear

• Let f be non-negative, monotone and submodular• Claim: f can be approximated to within factor n

by a linear function g.• Proof Sketch: Let g(S) = §s2S f({s}).

Then f(S) · g(S) · n¢f(S).

Submodularity: f(S)+f(T)¸f(SÅT)+f(S[T) 8S,TµVMonotonicity: f(S)·f(T) 8SµTNon-negativity: f(S)¸0 8SµV

Page 22: Learning  Submodular  Functions

V

Submodular Functions are Approximately Linear

f

n¢f

g

Page 23: Learning  Submodular  Functions

V+ +

+

+

+ +

+ f

n¢f

• Randomly sample {S1,…,Sq} from distribution

• Create + for f(Si) and – for n¢f(Si)• Now just learn a linear separator!

––

– –

– g

Page 24: Learning  Submodular  Functions

V

f

n¢f

• Theorem: g approximates f to within a factor n on a 1-² fraction of the distribution.

• Can improve to factor O(n1/2) by GHIM lemma: ellipsoidal approximation of submodular functions.

g

Page 25: Learning  Submodular  Functions

A Lower Bound?

• A non-uniform distribution focuses on fewer points,so the function is less concentrated on those points

• Can we create a submodular function with lots ofdeep “bumps”?

• Yes!

;

V

Page 26: Learning  Submodular  Functions

A General Lower Bound

Plan:Use the fact that matroid rank functions are submodular.Construct a hard family of matroids.Pick A1,…,Am ½ V with |Ai| = n1/3 and m=nlog n

A1 A2 ALA3

X

X X

Low=log2 n

High=n1/3

X

… … …. ….

No algorithm can PMAC learn the class of non-neg., monotone, submodular fns with an approx. factor õ(n1/3).

Theorem: (Our general lower bound)

Page 27: Learning  Submodular  Functions

Matroids

• Ground Set V• Family of Independent Sets I• Axioms:• ; 2 I “nonempty”• J ½ I 2 I ) J 2 I “downwards closed”• J, I 2 I and |J|<|I| ) 9x2InJ s.t. J+x 2 I

“maximum-size sets can be found greedily”

• Rank function: r(S) = max { |I| : I2I and IµS }

Page 28: Learning  Submodular  Functions

f(S) = min{ |S|, k }r(S) = |S| (if |S| · k) k (otherwise)

;

V

Page 29: Learning  Submodular  Functions

;

V

r(S) =|S| (if |S| · k)

k-1 (if S=A) k (otherwise)

A

Page 30: Learning  Submodular  Functions

;

V

r(S) =|S| (if |S| · k) k-1 (if S 2 A) k (otherwise)

A1A2

A3

Am

A = {A1,,Am}, |Ai|=k 8i

Claim: r is submodular if |AiÅAj|·k-2 8ijr is the rank function of a “paving matroid”

Page 31: Learning  Submodular  Functions

;

V

r(S) =|S| (if |S| · k) k-1 (if S 2 A) k (otherwise)

A1A2

A3

Am

A = {A1,,Am}, |Ai|=k 8i, |AiÅAj|·k-2 8ij

Page 32: Learning  Submodular  Functions

;

V

r(S) =|S| (if |S| · k) k-1 (if S 2 A and wasn’t deleted) k (otherwise)

A1

A3

Delete half of the bumps at random.If m large, alg. cannot learn which were deleted ) any algorithm to learn f has additive error 1

If algorithm seesonly these examples

Then f can’t bepredicted here

A2

Am

Page 33: Learning  Submodular  Functions

;

V

A1

A3

Can we force a bigger error with bigger bumps?

Yes!Need to generalize paving matroidsA needs to have very strong properties

Am

A2

Page 34: Learning  Submodular  Functions

The Main Question• Let V = A1[[Am and b1,,bm2N• Is there a matroid s.t.• r(Ai) · bi 8i• r(S) is “as large as possible” for SAi (this is not formal)

• If Ai’s are disjoint, solution is partition matroid

• If Ai’s are “almost disjoint”, can we find a matroid that’s “almost” a partition matroid?

Next: formalize this

Page 35: Learning  Submodular  Functions

Lossless Expander Graphs

• Definition:G =(U[V, E) is a (D,K,²)-lossless expander if– Every u2U has degree D– |¡ (S)| ¸ (1-²)¢D¢|S| 8SµU with |S|·K,

where ¡ (S) = { v2V : 9u2S s.t. {u,v}2E }

“Every small left-set has nearly-maximalnumber of right-neighbors”

U V

Page 36: Learning  Submodular  Functions

Lossless Expander Graphs

• Definition:G =(U[V, E) is a (D,K,²)-lossless expander if– Every u2U has degree D– |¡ (S)| ¸ (1-²)¢D¢|S| 8SµU with |S|·K,

where ¡ (S) = { v2V : 9u2S s.t. {u,v}2E }

“Neighborhoods of left-vertices areK-wise-almost-disjoint”

U V

Page 37: Learning  Submodular  Functions

Trivial Case: Disjoint Neighborhoods

• Definition:G =(U[V, E) is a (D,K,²)-lossless expander if– Every u2U has degree D– |¡ (S)| ¸ (1-²)¢D¢|S| 8SµU with |S|·K,

where ¡ (S) = { v2V : 9u2S s.t. {u,v}2E }

• If left-vertices have disjoint neighborhoods, this gives an expander with ²=0, K=1

U V

Page 38: Learning  Submodular  Functions

Main Theorem: Trivial Case

• Suppose G =(U[V, E) has disjoint left-neighborhoods.• Let A={A1,…,Am} be defined by A = { ¡(u) : u2U }.

• Let b1, …, bm be non-negative integers.

• Theorem:

is family of independent sets of a matroid.

I = f I : jI \ [ j 2 J A j j ·X

j 2 Jbj 8J gI = f I : jI \ A j j · bj 8j g

A1

A2

· b1

· b2U V

Partition matroid

u1

u2

u3

Page 39: Learning  Submodular  Functions

Main Theorem• Let G =(U[V, E) be a (D,K,²)-lossless expander• Let A={A1,…,Am} be defined by A = { ¡(u) : u2U }

• Let b1, …, bm satisfy bi ¸ 4²D 8i

A1

· b1

A2

· b2

Page 40: Learning  Submodular  Functions

Main Theorem• Let G =(U[V, E) be a (D,K,²)-lossless expander• Let A={A1,…,Am} be defined by A = { ¡(u) : u2U }

• Let b1, …, bm satisfy bi ¸ 4²D 8i

• “Desired Theorem”: I is a matroid, whereI =f I : jI \ [ j 2 J A j j ·

X

j 2 Jbj 8J g

Page 41: Learning  Submodular  Functions

Main Theorem• Let G =(U[V, E) be a (D,K,²)-lossless expander• Let A={A1,…,Am} be defined by A = { ¡(u) : u2U }

• Let b1, …, bm satisfy bi ¸ 4²D 8i

• Theorem: I is a matroid, whereI =f I : jI \ [ j 2 J A j j ·

X

j 2 Jbj ¡

³ X

j 2 JjA j j ¡ j [ j 2 J A j j

´

8J s.t. jJ j · K^ jI j · ²DK g

Page 42: Learning  Submodular  Functions

Main Theorem• Let G =(U[V, E) be a (D,K,²)-lossless expander• Let A={A1,…,Am} be defined by A = { ¡(u) : u2U }

• Let b1, …, bm satisfy bi ¸ 4²D 8i

• Theorem: I is a matroid, where

• Trivial case: G has disjoint neighborhoods,i.e., K=1 and ²=0.

I =f I : jI \ [ j 2 J A j j ·X

j 2 Jbj ¡

³ X

j 2 JjA j j ¡ j [ j 2 J A j j

´

8J s.t. jJ j · K^ jI j · ²DK g

= 0

= 1

= 0

= 1

Page 43: Learning  Submodular  Functions

LB for Learning Submodular Functions

;

VA2

A1

• How deep can we make the “valleys”?

n1/3

log2 n

Page 44: Learning  Submodular  Functions

LB for Learning Submodular Functions• Let G =(U[V, E) be a (D,K,²)-lossless expander, where Ai =

¡(ui) and– |V|=n − |U|=nlog n

– D = K = n1/3 − ² = log2(n)/n1/3

• Such graphs exist by the probabilistic method• Lower Bound Proof:– Delete each node in U with prob. ½, then use main theorem to

get a matroid– If ui2U was not deleted then r(Ai) · bi = 4²D = O(log2 n)

– Claim: If ui deleted then Ai 2 I (Needs a proof) ) r(Ai) = |Ai| = D = n1/3

– Since # Ai’s = |U| = nlog n, no algorithm can learna significant fraction of r(Ai) values in polynomial time

Page 45: Learning  Submodular  Functions

Summary• PMAC model for learning real-valued functions• Learning under arbitrary distributions:– Factor O(n1/2) algorithm– Factor (n1/3) hardness (info-theoretic)

• Learning under product distributions:– Factor O(log(1/²)) algorithm

• New general family of matroids– Generalizes partition matroids to non-disjoint parts

Page 46: Learning  Submodular  Functions

Open Questions

• Improve (n1/3) lower bound to (n1/2)• Explicit construction of expanders• Non-monotone submodular functions– Any algorithm?– Lower bound better than (n1/3)

• For algorithm under uniform distribution, relax 1-Lipschitz condition