An Introduction to Sparse Coding,
Sparse Sensing, and Optimization
Speaker: Wei-Lun Chao
Date: Nov. 23, 2011
DISP Lab, Graduate Institute of Communication Engineering, National Taiwan University 1
Outline
• Introduction
• The fundamental of optimization
• The idea of sparsity: coding V.S. sensing
• The solution
• The importance of dictionary
• Applications
2
Introduction
3
Introduction
• What is sparsity?
• Usage:
Compression
Analysis
Representation
Fast / sparse sensing4
Projection
bases
Reconstruction
bases
Introduction
• Why do we use Fourier transform and its modifications
for image and acoustic compression?
Differentiability (theoretical)
Intrinsic sparsity (data-dependent)
Human perception (human-centric)
• Better bases for compression or representation?
Wavelets
How about data-dependent bases?
How about learning?
5
Introduction
• Optimization
Frequently faced in algorithm design
Used to implement you creative idea
• Issue
What kinds of mathematical form and its corresponding
optimization algorithms do guarantee the convergence to
local or global optima?
6
The Fundamental of Optimization
7
A Warming-up Question
• How do you solve the following problems?
(1)
(2)
8
2min ( ) ( 5)w
f w w
Local minima
Global minima
(a) Plot
5
(b) Take derivatives, check = 0
An Advanced Question
• How about the following questions?
(3)
(4)
(5)
9
1
min ( )N
nw
n
f w
(a) Plot? (b) Take derivative = 0?
2min ( ) ( 5)
s.t. 3
wf w w
w
1
min ( )
s.t.
N
n
i
n
i
f
w b
w
w
53
Derivative?
How to do?
Illustration
• 2-D case:
10
1 21 21 2
,min ( , ), s. ( , )t. w w
g ww wf w b
1w
2w
1 2( , ) 1f w w
1 2( , ) 2f w w
1 2( , ) 3f w w
1 2( , ) 4f w w
1 2( , ) 5f w w 1 2( , ) 6f w w
1 2( , )g w w b
How to Solve?
• Thanks to……
Lagrange multiplier
Linear programming, quadratic programming, and recently,
convex optimization
• Standard form:
11
0min ( )
s.t. ( ) , 1,......,
s.t. ( ) , 1,......,
i i
i i
f
h b i m
g c i n
ww
w
w
Fallacy
• A quadratic programming problem with constraints
12
2
min A x
x b
1
1 2
| | | |
......
|| | |
N
N
x
b
x
a a a
The importance of each food
Personal nutrient need
Nutrient content of each food
0ix
(1) Take derivative (x)
(2) Quadratic programming (o)
(3) Sparse coding (o)
2
arg min , s.t . 0iA b x x
x
1( )
choose with 0
T T
i i
A A A
x x
x b
The Idea of Sparsity
13
What is Sparsity?
• Think about a problem?
14
2
1
1 2
min
| | | |
......
|| | |
d
N
N
N
A
x
R
x
xx b
a a a b
2
Many can achieve
min 0
x
A x
x b
Which do you want?
Assume full rank, N > d
Choose the x with the
least nonzero component
2
0arg min , s.t. 0A
xx x b
Why Sparsity?
• The more concise, the more better
• In some domain, there naturally exists a sparse latent vector
that controls the data we saw. (ex. MRI, music)
• In some domain, samples from the same class have the sparse
property.
• The domain can be learned.
15
1 2
0
| | | |
...... (
| | | |
0
noise)
i
d
j
x
x
b a a aA k-sparse domain means that each b can
be constructed by a x vector with at most
k nonzero element
Sparse Sensing VS. Sparse Coding
• Assume that:
16
Sparse
coding
Sparse
sensing
2*
0arg min , s.t. 0A
xx x x b
We have , . Now an observation comes ind dNA RN dR b
, p d
W
W R d p
y b
dRb
pRy 2**
0arg min , s.t. 0Q
xx x x y
, with sparseAb x x
, with sp rse aW W QA y b x x x
* **x = x
Note: p is based on the sparsity of the data (on k)
Sparse Sensing
17
, p d
W
W R d p
y b
dRb
pRy
, with sparseAb x x
, with sp rse aW W QA y b x x x
* **x x
Sparse Sensing VS. Sparse Coding
• Sparse sensing (compressed sensing):
It spends much time or money to get b, so get y first then
recover b
• Sparse coding (sparse representation):
Believe that there exists the sparse property in the data,
otherwise sparse representation means nothing.
x is used to be the feature of b
x can be used to efficiently store b and reconstruct b
18
The Solution
19
How to Get The Sparse Solution?
• There is no algorithm other than exhaustively searching to solve:
• While in some situations (ex. special form of A), the solution of
l1 minimization approaches the one of l0 minimization
20
2*
0arg min , s.t. 0A
xx x x b
2*** ( )
11
*** *
arg min = , s.t. 0N
n
n
x A
x
x x x b
x x
Why l1?
• Question 1: Why l1 can result in a sparse solution?
21
2
11
2arg min , arg min , s.t. 0 s.t. A cA
x xx b xx x b
1w
2w
2
A x b
1 cx
2 cx
Why l1?
• Question 2: Why the sparse solution achieved by l1
minimization approaches the one of l0 minimization?
This is a matter or Mathematics
No matter how, sparse representation based on l1 minimization
has been widely used for pattern recognition.
In addition, if one doesn’t care about using the sparse solution
for representation (feature), it seems OK if these two solutions
are not the same.
22
***
*
A
A
b x
b x
Noise
• Sometimes, the data is observed with nose
• The answer seems to be negative
23
*
1 2
*
0
| | | |
......
| | | |
0
i
d
i
x
x
b a a anoise b b
0 1( ) minimizationl l
???x x
2* *
1
* * * *
2* * *
1
not
, arg min , s.t. 0
, and is usually
arg min , s.t. 0 is neither equal to no
spar
se
r to
A A noise
A
y
x
b x y y y
y x x y
x x x b y x x
possibly not sparse
Noise
• Several ways to overcome this:
• What is the difference between:
24
2
1 1 2
1 1
arg min , s.t. 0 arg min , s.t.
arg min , s.t.
A A c
A c
x x
x
x x b x x b
x x b
22
1 1arg min , s.t. 0 arg min , s.t. | 0, where A A I
x z
xx x b z z b z
t
2 1 and A c A c x b x b
Equivalent form
• You may also see several forms for the problem:
• These equivalent forms are derived from Lagrange
multiplier
• There have been several publications aiming at how
solving the l1 minimization problem.
25
1 1 1 1
1
arg min , s.t. arg min
arg min , s.t.
A Ac
dA
x x
x
x x b x x b
x b x
The Importance of Dictionary
26
Dictionary generation
• If the preceding sections, we generally assume that
the (over-complete) bases A is existed and known
• However in practice, we usually need to build it:
Wavelet + Fourier + Haar + ……
Learning based on data
• How to learn?
• May result in over-fitting27
( ) (1) (2) ( )
1
* * (1) (2) (2 )
1,
Given a training set , form as ......
, arg min , where ......
Ni d N
i
N
A X F
R B B
A X B AX X X
b b b b
x x x
Applications
28
Back to the problem we have
• A quadratic programming problem with constraints
29
2
min A x
x b
1
1 2
| | | |
......
|| | |
N
N
x
b
x
a a a
The importance of each food
Personal nutrient need
Nutrient content of each food
0ix
(1) Take derivative (x)
(2) Quadratic programming (o)
(3) Sparse coding (o)
2
arg min , s.t . 0iA b x x
x
1( )
choose with 0
T T
i i
A A A
x x
x b
Face Recognition (1)
30
Face Recognition (2)
31
An important issue
• When using sparse representation as a way of feature
extraction, you may wonder, even if there exists the
sparsity property in the data, does sparse feature
really come up with better results? Does it contain
any semantic meaning?
• Successful areas:
Face recognition
Digit recognition
Object recognition (with carful design):
Ex. K-means Sparse representation
32
De-noising
33
Learn a patch dictionary.
For each patch, compute
the sparse representation
then use it to reconstruct
the patch.
*
1 1
*
arg min A
A
xx x x b
b x
Detection based on reconstruction
34
Learn a patch dictionary for a specific
object. For each patch in the image,
compute the sparse representation
and use it to reconstruct the image.
Check the error for each patch, and
identify those with small error as
detected object.
*
1 1
*
2
2
arg min
check
A
A
xx x x b
b x
b bMaybe not over-complete
Other cases: Foreground-background detection, pedestrian detection, ……
Conclusion
35
What you should know
• What is the form of standard optimization?
• What is sparsity?
• What is sparse coding and sparse sensing?
• What kind of optimization method to solve it?
• Try to use it !!
36
Thank you for listening
37