recent advances in structured sparse models · recent advances in structured sparse models julien...

46
Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st , 2010 Julien Mairal Recent Advances in Structured Sparse Models 1/46

Upload: others

Post on 29-Sep-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Recent Advances in Structured Sparse Models

Julien Mairal

Willow group - INRIA - ENS - Paris

21 September 2010

LEAR seminarAt Grenoble, September 21st, 2010

Julien Mairal Recent Advances in Structured Sparse Models 1/46

Page 2: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Acknowledgements

Francis Rodolphe GuillaumeBach Jenatton Obozinski

Julien Mairal Recent Advances in Structured Sparse Models 2/46

Page 3: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

What this talk is about?

Sparse Linear Models

Not only sparse, but also structured!

Solving challenging optimization problems

Developping new applications of sparse models in computervision and machine learning

Related publications:

[1] J. Mairal, R. Jenatton, G. Obozinski and F. Bach. Network FlowAlgorithms for Structured Sparsity. NIPS, 2010

[2] R. Jenatton, J. Mairal, G. Obozinski and F. Bach. Proximal Methods forHierarchical Sparse Coding. arXiv:1009.2139v1

[3] R. Jenatton, J. Mairal, G. Obozinski and F. Bach. Proximal Methods forSparse Hierarchical Dictionary Learning. ICML, 2010

Julien Mairal Recent Advances in Structured Sparse Models 3/46

Page 4: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Sparse Linear Model: Machine Learning Point of View

Let (y i , xi )ni=1 be a training set, where the vectors xi are in Rp and are

called features. The scalars y i are in

−1,+1 for binary classification problems.

1, . . . ,N for multiclass classification problems.

R for regression problems.

In a linear model, on assumes a relation y ≈ w⊤x, and solves

minw∈Rp

1

n

n∑

i=1

ℓ(y i ,w⊤xi )

︸ ︷︷ ︸

data-fitting

+ λΩ(w)︸ ︷︷ ︸

regularization

.

Julien Mairal Recent Advances in Structured Sparse Models 4/46

Page 5: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Sparse Linear Models: Machine Learning Point of View

A few examples:

Ridge regression: minw∈Rp

1

2n

n∑

i=1

(y i −w⊤xi )2 + λ‖w‖22.

Linear SVM: minw∈Rp

1

n

n∑

i=1

max(0, 1− y iw⊤xi ) + λ‖w‖22.

Logistic regression: minw∈Rp

1

n

n∑

i=1

log(

1 + e−y iw⊤xi)

+ λ‖w‖22.

The squared ℓ2-norm induces smoothness in w. When one knows inadvance that w should be sparse, one should use a sparsity-inducing

regularization such as the ℓ1-norm. [Chen et al., 1999, Tibshirani, 1996]

The purpose of the talk is to add additional a-priori knowledge in theregularization.

Julien Mairal Recent Advances in Structured Sparse Models 5/46

Page 6: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Sparse Linear Models: Signal Processing Point of View

Let y in Rn be a signal.

Let D = [d1, . . . ,dp] ∈ Rn×p be a set of

normalized “basis vectors”.We call it dictionary.

D is “adapted” to y if it can represent it with a few basis vectors—thatis, there exists a sparse vector w in R

p such that x ≈ Dw. We call wthe sparse code.

y

︸ ︷︷ ︸

y∈Rn

d1 d2 · · · dp

︸ ︷︷ ︸

D∈Rn×p

w1

w2

...wp

︸ ︷︷ ︸

w∈Rp,sparse

Julien Mairal Recent Advances in Structured Sparse Models 6/46

Page 7: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Sparse Linear Models: the Lasso

Signal processing: D is a dictionary in Rn×p,

minw∈Rp

1

2‖y −Dw‖22 + λ‖w‖1.

Machine Learning:

minw∈Rp

1

2n

n∑

i=1

(y i − xi⊤w)2 + λ‖w‖1 = minw∈Rp

1

2n‖y−X⊤w‖22 + λ‖w‖1,

with X

= [x1, . . . , xn], and y

= [y1, . . . , yn]⊤.

Useful tool in signal processing, machine learning, statistics,neuroscience,. . . as long as one wishes to select features.

Julien Mairal Recent Advances in Structured Sparse Models 7/46

Page 8: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Why does the ℓ1-norm induce sparsity?Analysis of the norms in 1D

Ω(w) = w2

Ω′(w) = 2w

Ω(w) = |w |

Ω′−(w) = −1, Ω′

+(w) = 1

The gradient of the ℓ2-norm vanishes when w get close to 0. On itsdifferentiable part, the norm of the gradient of the ℓ1-norm is constant.

Julien Mairal Recent Advances in Structured Sparse Models 8/46

Page 9: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Why does the ℓ1-norm induce sparsity?Geometric explanation

x

y

x

y

minw∈Rp

1

2‖y −Dw‖22 + λ‖w‖1

minw∈Rp

‖y −Dw‖22 s.t. ‖w‖1 ≤ T .

Julien Mairal Recent Advances in Structured Sparse Models 9/46

Page 10: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Other Sparsity-Inducing Norms

minw∈Rp

data fitting term︷︸︸︷

f (w) + λ Ω(w)︸ ︷︷ ︸

sparsity-inducing norm

The most popular choice for Ω:

The ℓ1 norm, ‖w‖1 =∑p

j=1 |wj |.

However, the ℓ1 norm encodes poor information, just cardinality!

Another popular choice for Ω:

The ℓ1-ℓq norm [Yuan and Lin, 2006], with q = 2 or q =∞

g∈G

‖wg‖q with G a partition of 1, . . . , p.

The ℓ1-ℓq norm sets to zero groups of non-overlapping variables

(as opposed to single variables for the ℓ1 norm).

Julien Mairal Recent Advances in Structured Sparse Models 10/46

Page 11: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Sparsity-Inducing Norms

Ω(w) =∑

g∈G

‖wg‖q

Applications of group sparsity:

Selecting groups of features instead of individual variables.

Multi-task learning.

Multiple kernel learning.

Drawbacks:

Requires a partition of the features.

Encodes fixed/static information.

What happens when the groups overlap? [Jenatton et al., 2009]

Inside the groups, the ℓ2-norm (or ℓ∞) does not promote sparsity.

Variables belonging to the same groups are encouraged to be set tozero together.

Julien Mairal Recent Advances in Structured Sparse Models 11/46

Page 12: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Examples of set of groups G (1/3)[Jenatton et al., 2009]

Selection of contiguous patterns on a sequence, p = 6.

G is the set of blue groups.

Any union of blue groups set to zero leads to the selection of acontiguous pattern.

Julien Mairal Recent Advances in Structured Sparse Models 12/46

Page 13: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Examples of set of groups G (2/3)[Jenatton et al., 2009]

Selection of rectangles on a 2-D grids, p = 25.

G is the set of blue/green groups (with their not displayedcomplements).

Any union of blue/green groups set to zero leads to the selection ofa rectangle.

Julien Mairal Recent Advances in Structured Sparse Models 13/46

Page 14: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Examples of set of groups G (3/3)[Jenatton et al., 2009]

Selection of diamond-shaped patterns on a 2-D grids, p = 25.

It is possible to extent such settings to 3-D space, or more complextopologies.

Julien Mairal Recent Advances in Structured Sparse Models 14/46

Page 15: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Hierarchical Norms[Zhao et al., 2009, Bach, 2009]

A node can be active only if its ancestors are active.The selected patterns are rooted subtrees.

Julien Mairal Recent Advances in Structured Sparse Models 15/46

Page 16: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Group Lasso + Sparsity[Sprechmann et al., 2010]

Julien Mairal Recent Advances in Structured Sparse Models 16/46

Page 17: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Application 1:

Wavelet denoising with hierarchical norms

Julien Mairal Recent Advances in Structured Sparse Models 17/46

Page 18: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Wavelet denoising with hierarchical norms[Jenatton, Mairal, Obozinski, and Bach, 2010b]

Classical wavelet denoising [Donoho and Johnstone, 1995]:

minw∈Rp

1

2‖x−Dw‖22 + λ‖w‖1,

When D is orthogonal, the solution is obtained via soft-thresholding.

Julien Mairal Recent Advances in Structured Sparse Models 18/46

Page 19: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Wavelet denoising with hierarchical norms[Jenatton, Mairal, Obozinski, and Bach, 2010b]

Wavelet with hierarchical norm: Add a-priori knowledge that thecoefficients are embedded in a tree.

(a) Barb., σ = 50, ℓ1 (b) Barb., σ = 50, tree

Julien Mairal Recent Advances in Structured Sparse Models 19/46

Page 20: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Wavelet denoising with hierarchical norms[Jenatton, Mairal, Obozinski, and Bach, 2010b]

Benchmark on a database of 12 standard images:

Haarσ ℓ0 ℓ1 Ωℓ2 Ωℓ∞

PSNR

5 34.48 35.52 35.89 35.7910 29.63 30.74 31.40 31.2325 24.44 25.30 26.41 26.1450 21.53 20.42 23.41 23.05100 19.27 19.43 20.97 20.58

IPSNR

5 - 1.04± .31 1.41± .45 1.31± .4110 - 1.10± .22 1.76± .26 1.59± .2225 - .86± .35 1.96± .22 1.69± .2150 - .46± .28 1.87± .20 1.51± .20100 - .15± .23 1.69± .19 1.30± .19

Julien Mairal Recent Advances in Structured Sparse Models 20/46

Page 21: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Application 2:

Hierarchical Dictionary Learning

Julien Mairal Recent Advances in Structured Sparse Models 21/46

Page 22: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Hierarchical Dictionary Learning[Olshausen and Field, 1997, Elad and Aharon, 2006, Mairal et al., 2010a]

We now consider a sequence yimi=1, of signals in Rn.

minW∈Rp×m,D∈C

m∑

i=1

1

2‖yi −Dwi‖22 + λ‖wi‖1,

This can be rewritten as a matrix factorization problem

minW∈Rp×m,D∈C

1

2‖Y −DW‖2F + λ‖W‖1,1.

[Jenatton, Mairal, Obozinski, and Bach, 2010a]:What about replacing the ℓ1-norm by a hierarchical norm?

Julien Mairal Recent Advances in Structured Sparse Models 22/46

Page 23: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Hierarchical Dictionary LearningDictionaries learned with the ℓ1-norm

Julien Mairal Recent Advances in Structured Sparse Models 23/46

Page 24: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Hierarchical Dictionary Learning[Jenatton, Mairal, Obozinski, and Bach, 2010a]

Julien Mairal Recent Advances in Structured Sparse Models 24/46

Page 25: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Application to patch reconstrution[Jenatton, Mairal, Obozinski, and Bach, 2010a]

Reconstruction of 100,000 8× 8 natural images patches

Remove randomly subsampled pixelsReconstruct with matrix factorization and structured sparsity

noise 50 % 60 % 70 % 80 % 90 %

flat 19.3± 0.1 26.8± 0.1 36.7± 0.1 50.6± 0.0 72.1± 0.0

tree 18.6± 0.1 25.7± 0.1 35.0± 0.1 48.0± 0.0 65.9± 0.3

16 21 31 41 61 81 121 161 181 241 301 321 40150

60

70

80

Julien Mairal Recent Advances in Structured Sparse Models 25/46

Page 26: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Hierarchical Topic Models for text corpora[Jenatton, Mairal, Obozinski, and Bach, 2010a]

Each document is modeled through word counts

Low-rank matrix factorization of word-document matrix

Probabilistic topic models such as Latent Dirichlet Allocation [Bleiet al., 2003]

Organise the topics in a tree.

Previously approached using non-parametric Bayesianmethods (Hierarchical Chinese Restaurant Process and nestedDirichlet Process): [Blei et al., 2010]

Can we achieve similar performance with simple matrix

factorization formulation?

Julien Mairal Recent Advances in Structured Sparse Models 26/46

Page 27: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Tree of Topics

Julien Mairal Recent Advances in Structured Sparse Models 27/46

Page 28: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Classification based on topics

Comparison on predicting newsgroup article subjects

20 newsgroup articles (1425 documents, 13312 words)

3 7 15 31 6360

70

80

90

100

Number of Topics

Cla

ssific

atio

n A

ccu

racy (

%)

PCA + SVM

NMF + SVM

LDA + SVM

SpDL + SVM

SpHDL + SVM

Julien Mairal Recent Advances in Structured Sparse Models 28/46

Page 29: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Application 3:

Background Subtraction

Julien Mairal Recent Advances in Structured Sparse Models 29/46

Page 30: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Background Subtraction

Given a video sequence, how can we remove foreground objects?

video sequence 1 video sequence 2

Julien Mairal Recent Advances in Structured Sparse Models 30/46

Page 31: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Background Subtraction

x︸︷︷︸

frame

≈ Dw︸︷︷︸

linear combination of background frames

+ e︸︷︷︸

error term

.

Solved by

minw∈Rp ,e∈Rm

1

2‖x−Dw − e‖22 + λ1‖w‖+ λ2Ω(e).

Same idea as Wright et al. [2009] for robust face recognition,where Ω = ℓ1.

We are going to use overlapping groups with 3× 3 neighborhoods to addspatial consistency. See also Cehver et al. [2008] for structured sparsity+ background subtraction.

Julien Mairal Recent Advances in Structured Sparse Models 31/46

Page 32: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Background Subtraction

(a) input (b) estimated background (c) foreground, ℓ1

(d) foreground, ℓ1+struct (e) other example

Julien Mairal Recent Advances in Structured Sparse Models 32/46

Page 33: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Background Subtraction

(a) input (b) estimated background (c) foreground, ℓ1

(d) foreground, ℓ1+struct (e) other example

Julien Mairal Recent Advances in Structured Sparse Models 33/46

Page 34: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

How do we optimize all that?

Julien Mairal Recent Advances in Structured Sparse Models 34/46

Page 35: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

First-order/proximal methods

minw∈Rp

f (w) + λΩ(w)

f is strictly convex and differentiable with a Lipshitz gradient.

Generalizes the idea of gradient descent

wk+1←argminw∈Rp

f (wk)+∇f (wk)⊤(w −wk)︸ ︷︷ ︸

linear approximation

+L

2‖w − wk‖22

︸ ︷︷ ︸

quadratic term

+λΩ(w)

← argminw∈Rp

1

2‖w − (wk −

1

L∇f (wk))‖22 +

λ

LΩ(w)

When λ = 0, wk+1 ← wk − 1L∇f (wk), this is equivalent to a

classical gradient descent step.

Julien Mairal Recent Advances in Structured Sparse Models 35/46

Page 36: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

First-order/proximal methods

They require solving efficiently the proximal operator

minw∈Rp

1

2‖u−w‖22 + λΩ(w)

For the ℓ1-norm, this amounts to a soft-thresholding:

w⋆i = sign(ui )(ui − λ)+.

There exists accelerated versions based on Nesterov optimalfirst-order method (gradient method with “extrapolation”) [Beckand Teboulle, 2009, Nesterov, 2007, 1983]

suited for large-scale experiments.

Julien Mairal Recent Advances in Structured Sparse Models 36/46

Page 37: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Tree-structured groups

Proposition [Jenatton, Mairal, Obozinski, and Bach, 2010a]

If G is a tree-structured set of groups, i.e., ∀g , h ∈ G,

g ∩ h = ∅ or g ⊂ h or h ⊂ g

For q = 2 or q =∞, we define Proxg and ProxΩ as

Proxg :u→ argminw∈Rp

1

2‖u−w‖+ λ‖wg‖q,

ProxΩ :u→ argminw∈Rp

1

2‖u−w‖+ λ

g∈G

‖wg‖q,

If the groups are sorted from the leaves to the root, then

ProxΩ = Proxgm . . . Proxg1 .

→ Tree-structured regularization : Efficient linear time algorithm.

Julien Mairal Recent Advances in Structured Sparse Models 37/46

Page 38: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

General Overlapping Groups for q =∞

Dual formulation [Jenatton, Mairal, Obozinski, and Bach, 2010a]

The solutions w⋆ and ξ⋆ of the following optimization problems

minw∈Rp

1

2‖u−w‖+ λ‖wg‖∞, (Primal)

minξ∈Rp×|G|

1

2‖u−

g∈G

ξg‖22 s.t. ∀g ∈ G, ‖ξg‖1 ≤ λ and ξgj = 0 if j /∈ g ,

(Dual)satisfy

w⋆ = u−∑

g∈G

ξ⋆g . (Primal-dual relation)

The dual formulation has more variables, but no overlapping

constraints.

Julien Mairal Recent Advances in Structured Sparse Models 38/46

Page 39: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

General Overlapping Groups for q =∞[Mairal, Jenatton, Obozinski, and Bach, 2010b]

First Step: Flip the signs of u

The dual is equivalent to a quadratic min-cost flow problem.

minξ∈R

p×|G|+

1

2‖u−

g∈G

ξg‖22 s.t. ∀g ∈ G,∑

j∈g

ξgj ≤ λ and ξ

gj = 0 if j /∈ g ,

Julien Mairal Recent Advances in Structured Sparse Models 39/46

Page 40: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

General Overlapping Groups for q =∞Example: G = g = 1, . . . , p

minξg∈R

p+

1

2‖u− ξg‖22 s.t.

p∑

j=1

ξgj ≤ λ.

s

g

ξg1+ξ

g2+ξ

g3 ≤λ

u2

ξg2

u1

ξg1

u3

ξg3

t

ξ1, c1 ξ2, c2 ξ3, c3

Figure: G=g=1, 2, 3, ∀j , cj =12 (uj − ξj)

2.

Julien Mairal Recent Advances in Structured Sparse Models 40/46

Page 41: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

General Overlapping Groups for q =∞Example with two overlapping groups

minξ∈R

p×|G|+

1

2‖u−

g∈G

ξg‖22 s.t. ∀g ∈ G,∑

j∈g

ξgj ≤ λ and ξ

gj = 0 if j /∈ g ,

s

g

ξg1+ξ

g2 ≤λ

h

ξh2+ξh3≤λ

u2

ξh2ξg2

u1

ξg1

u3

ξh3

t

ξ1, c1 ξ2, c2 ξ3, c3

Figure: G=g=1, 2, h=2, 3, ∀j , cj =12 (uj − ξj)

2.

Julien Mairal Recent Advances in Structured Sparse Models 41/46

Page 42: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

General Overlapping Groups for q =∞[Mairal, Jenatton, Obozinski, and Bach, 2010b]

Main ideas of the algorithm: Divide and conquer

1 Solve a relaxed problem in linear time.

2 Test the feasability of the solution for the “non-relaxed” problemwith a max-flow.

3 If the solution is feasible, it is optimal and stop the algorithm.

4 If not, find a minimum cut and removes the arcs along the cut.

5 Recursively process each part of the graph.

The algorithm is guaranteed to converge to the solution.

See more details in the paper

Julien Mairal Recent Advances in Structured Sparse Models 42/46

Page 43: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

Conclusions

We have developed efficient and large-scale algorithmic tools forsolving structured sparse decomposition problems.

These tools are related to network flow optimization.

The hierarchical case can be solved at the same cost as ℓ1.

There are preliminary applications in computer vision, there shouldbe more!

Julien Mairal Recent Advances in Structured Sparse Models 43/46

Page 44: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

References I

F. Bach. High-dimensional non-linear variable selection through hierarchical kernellearning. Technical report, arXiv:0909.0844, 2009.

A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linearinverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.

D. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. Journal of MachineLearning Research, 3:993–1022, January 2003.

D. Blei, T. Griffiths, and M. Jordan. The nested chinese restaurant process andbayesian nonparametric inference of topic hierarchies. Journal of the ACM, 57(2):1–30, 2010.

V. Cehver, M. F. Duarte, C. Hegde, and R. G. Baraniuk. Sparse signal recoveryusingmarkov random fields. In Advances in Neural Information Processing Systems,2008.

S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basispursuit. SIAM Journal on Scientific Computing, 20:33–61, 1999.

D. L. Donoho and I. M. Johnstone. Adapting to unknown smoothness via waveletshrinkage. Journal of the American Statistical Association, 90(432):1200–1224,1995.

Julien Mairal Recent Advances in Structured Sparse Models 44/46

Page 45: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

References IIM. Elad and M. Aharon. Image denoising via sparse and redundant representations

over learned dictionaries. IEEE Transactions on Image Processing, 54(12):3736–3745, December 2006.

R. Jenatton, J-Y. Audibert, and F. Bach. Structured variable selection withsparsity-inducing norms. Technical report, 2009. preprint arXiv:0904.3523v1.

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for sparsehierarchical dictionary learning. In Proceedings of the International Conference onMachine Learning (ICML), 2010a.

R. Jenatton, J. Mairal, G. Obozinski, and F. Bach. Proximal methods for hierarchicalsparse coding. Technical report, 2010b. submitted, arXiv:1009.3139.

J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online learning for matrix factorizationand sparse coding. Journal of Machine Learning Research, 2010a.

J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Network flow algorithms forstructured sparsity. In Advances in Neural Information Processing Systems, 2010b.

Y. Nesterov. A method for solving a convex programming problem with convergencerate O(1/k2). Soviet Math. Dokl., 27:372–376, 1983.

Y. Nesterov. Gradient methods for minimizing composite objective function.Technical report, CORE, 2007.

Julien Mairal Recent Advances in Structured Sparse Models 45/46

Page 46: Recent Advances in Structured Sparse Models · Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble,

References IIIB. A. Olshausen and D. J. Field. Sparse coding with an overcomplete basis set: A

strategy employed by V1? Vision Research, 37:3311–3325, 1997.

P. Sprechmann, I. Ramirez, G. Sapiro, and Y. C. Eldar. Collaborative hierarchicalsparse modeling. Technical report, 2010. Preprint arXiv:1003.0400v1.

R. Tibshirani. Regression shrinkage and selection via the Lasso. Journal of the RoyalStatistical Society. Series B, 58(1):267–288, 1996.

J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, and Y. Ma. Robust face recognition viasparse representation. IEEE Transactions on Pattern Analysis and MachineIntelligence, pages 210–227, 2009.

M. Yuan and Y. Lin. Model selection and estimation in regression with groupedvariables. Journal of the Royal Statistical Society Series B, 68:49–67, 2006.

P. Zhao, G. Rocha, and B. Yu. The composite absolute penalties family for groupedand hierarchical variable selection. 37(6A):3468–3497, 2009.

Julien Mairal Recent Advances in Structured Sparse Models 46/46