data compressing

Upload: myoneslove

Post on 14-Apr-2018

237 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Data Compressing

    1/163

    Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)

    2012/FallCh2Ch2--page.page.11

    Digital Media Lab.

    Data CompressionData Compression

    (ECE 5546(ECE 5546--41)41)

    Introduction to Compressive Sensing

    Byeungwoo Jeon

    Digital Media Lab, SKKU, Korea

    http://media.skku.ac.kr; [email protected]

    2012 Fall

    Digital Media Lab.

    Course IntroductionCourse Introductionn Data compression (before)

    n Main text: Introduction to Data Compression (3rd Ed) (K. Sayood)n Main Topics

    n Mathematical Preliminaries for Lossless compressionn Huffman Codingn Arithmetic Codingn Dictionary Techniquesn Context-Based Compressionn Lossless Image Compression

    n Data compression (this semester): Compressed Sensingn Texts:

    n R. Baraniuk, M. Davenport, M. Duarte, C. An Hegde, An Introductionto Compressive Sensing, Connexions Web site.http://cnx.org/content/col11133/1.5/, Apr 2, 2011.

    n Compressed Sensing: Theory and Applications, edited by Y. C. Eldarand G. Kutyniok

    n Lecture Note, Introduction to Compressed Sensing, Spring, 2011 byProf. Heung-No Lee (http://infonet.gist.ac.kr)

    n Selected papers

    2

  • 7/30/2019 Data Compressing

    2/163

    Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)

    2012/FallCh2Ch2--page.page.22

    Digital Media Lab.

    Major Subjects to CoverMajor Subjects to Cover

    3 Digital Media Lab.

    How to StudyHow to Studyn Basic framework of the course

    n Lecture (2 hours)n Paper Investigation (1 hour) : student presentation

    n Each student should study thoroughly and present at least one paper.n It should be completely understood by the presenter before presentation.n A list of papers will be provided by the instructor. However, a preferred

    paper can be suggested by student.

    n Grading Policyn Attendance 10%n Project/Presentation 20 %n Homework 10 %n Exam (Mid 30 + Final 30) 60 %

    4

  • 7/30/2019 Data Compressing

    3/163

    Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)

    2012/FallCh2Ch2--page.page.33

    Digital Media Lab.

    Very Brief Introduction to CSVery Brief Introduction to CS

    modified from a file by Igor Carron (version 2- draft)

    athttps://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxpZ2

    9yY2Fycm9uMnxneDoxYmNkZjU5MWQ2NmJkOGUy)

    Digital Media Lab.

    Solving linear equationsSolving linear equationsn Solving linear equations (Y: Measured; X: Unknown; A: from signal

    model)

    n Its solution of X is easy unless the matrix A is non-invertible.n Nonlinear system can be approximated to a linear system of equations.n Continuous system can be discretized to a linear system of equations.

    Y AX=

    6

  • 7/30/2019 Data Compressing

    4/163

  • 7/30/2019 Data Compressing

    5/163

  • 7/30/2019 Data Compressing

    6/163

    Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)

    2012/FallCh2Ch2--page.page.66

    Digital Media Lab.

    RfRf: Regularization (3): Regularization (3)n In statistics and machine learning, regularization is used to prevent

    overfitting. Typical examples of regularization in statistical machinelearning include ridge regression, lasso, and L2-norm in supportvector machines.

    n Regularization methods are also used for model selection, wherethey work by implicitly or explicitly penalizing models based on thenumber of their parameters. For example, Bayesian learning methodsmake use of a prior probability that (usually) gives lower probability tomore complex models. Well-known model selection techniquesinclude theAkaike information criterion (AIC), minimum descriptionlength (MDL), and the Bayesian information criterion (BIC).

    Alternative methods of controlling overfitting not involvingregularization include cross-validation.

    http://en.wikipedia.org/wiki/Regularization_(mathematics) 11 Digital Media Lab.

    Compressed SensingCompressed Sensingn An instance of an underdetermined system of linear equation is a

    compressed sensing system.n Y ~ compressed measurement (few)n A ~ sensing (in a form of linear combinations)n X ~ original information (like to find)

    n The recovery of a sparse solution to an underdetermined system oflinear equations is performed using Compressed SensingReconstruction techniques /solvers.

    n Key Question: Do all underdetermined systems of linearequations admit very sparse and unique solution ?n Answer: Some systems do under a condition (RIP, NSP,.).

    n Issues to study (in this semester)n Mathematical backgroundn Checking the conditionn Recovery algorithmsn Implementing the algorithmsn Applications

    Y AX=

    12

  • 7/30/2019 Data Compressing

    7/163

  • 7/30/2019 Data Compressing

    8/163

    Data CompressionData Compression

    (ECE 5546(ECE 5546--41)41)

    2012 Fall

    Digital Media Lab.

    Ch2. Sparse and Compressible Signal Models

    Byeungwoo Jeon

    Digital Media Lab, SKKU, Korea

    http://media.skku.ac.kr; [email protected]

    What we like to cover in this classWhat we like to cover in this class

    Digital Media Lab. 2Algorithms for sparse analysis : Lecture I: Background on sparse approximation by

    Anna C. Gilbert Department of Mathematics, University of Michigan)

  • 7/30/2019 Data Compressing

    9/163

    Underdetermined linear equationsUnderdetermined linear equations

    n Solving linear equations (Y: Measured; X: Unknown; A: from signalmodel)

    n Underdetermined case:n Too few equations and too many unknowns means infinite number of

    solutions (ie, matrix A cannot be inverted as in the square case)

    Y AX=

    Digital Media Lab.

    n CS tries to solve this under the condition of sparseness.n Compressed Sensing reconstruction techniques allows one to find a solution

    that is sparse, i.e., thathas the property of having very few non-zeros

    elements (the rest of the elements are zeros).

    n We need to evaluate the fitness of solution: need measurement (~ norm)n Need to formally define signal model, sparseness, compressibility, etc.

    Need lots of concept in linear algebra

    3

    Compressed SensingCompressed Sensing

    n An instance of an underdetermined system of linear equation is acompressed sensing system.n Y ~ compressed measurement (few)n A ~ sensing (in a form of linear combinations)n X ~ original information (like to find)

    n The recovery of a sparse solution to an underdetermined system oflinear equations is performed using Compressed Sensing

    Y AX=

    Digital Media Lab.

    .

    n Key Question: Do all underdetermined systems of linearequations admit very sparse and unique solution ?n Answer: Some systems do under a condition (RIP, NSP,.).

    n Issues to study (in this semester)n Mathematical backgroundn Checking the conditionn Recovery algorithmsn Implementing the algorithmsn Applications

    4

  • 7/30/2019 Data Compressing

    10/163

  • 7/30/2019 Data Compressing

    11/163

    Vector in 2Vector in 2--D (or 3D (or 3--D) worldD) world

    n Vector: a directed line segment (direction & magnitude)n In 2-spacen In 3-space

    n Vector addition and scalar multiplication

    n Vector norm: Ifv is a vector then the magnitude of the vector, calledthe norm of the vector and denoted by ||v||. Furthermore, ifv is a

    Terminal point

    Initial point

    Digital Media Lab.

    vector in 2-space (or in 3-space), then,

    n Dot product: Ifu and v are two vectors in 2-space (or 3-space), andthe angle between them is q, then the dot product is defined as,

    n It is sometimes called scalar productorEuclidean inner product

    7

    2 2 2 2 2

    1 2 1 2 3( 2 ); ( 3 );v v v in space v v v v in space= + - = + + -

    cos( )u v u v q=g

    Extension to NExtension to N--space (1)space (1)

    n Definition ofn-space: For a given a positive integer n, an ordered n-tuple is a sequence of n real numbers denoted by . Thecomplete set of all ordered n-tuples is called n-space and is denotedby .n It is a natural extension of 2-space, 3-space

    n Definition of arithmetic operations in n-space:

    ( )1 2, , ..., na a a

    n

    Digital Media Lab. 8

  • 7/30/2019 Data Compressing

    12/163

    Extension to NExtension to N--space (2)space (2)

    n Definition ofEuclidean inner product: For two vectors u, v in Rn,

    , the Euclidean inner productisdefined as:

    n It is a natural extension of dot product in 2-space.n It can be written in matrix as following: (suppose u and v are in column

    ( ) ( )1 2 1 2, ,..., , , ,...,n nu u u u v v v v= =

    1,

    n

    iiu v u v u v=

    = < > =

    g

    Digital Media Lab.

    vectors)

    n Note that when we add in addition, scalar multiplication and theEuclidean inner product to the n-space, it is often called Euclidean n-space.

    9

    Tu v v u=g

    n Lets extend the concepts of norm and distance to n-space

    n Definition: For a vector , the Euclidean normis defined as,

    ( )1 2, , ...,n

    nu u u u R=

    2

    1

    n

    ii

    u u u u=

    = =

    g

    Extension to NExtension to N--space (3)space (3)

    Digital Media Lab.

    n Definition: For two vectors , the Euclidean distance bet.two points indicated by the two vectors, is defined as,

    10

    , nu v R

    ( )2

    1

    ( , )n

    i i

    i

    d u v u v u v=

    = - = -

  • 7/30/2019 Data Compressing

    13/163

    Generalization to Vector SpaceGeneralization to Vector Space

    n Up to now, we have a good geometric analogy, esp., on 2-space (or3-space), coming from a notion that a vector is interpreted as adirectional line segment.

    n A vector, however, is a much more general concept and it doesntnecessarily have to represent a directed line segments as before.n For example, a vector can be a matrix or a function and that is only a

    couple of possibilities for vectors.

    Digital Media Lab.

    n Nor does a vector have to represent the vectors we looked at in Rn

    (that is, a vector may not be in Rn, therefore, it is a more general

    object).

    n The concept of n-space is now generalized into vector space.n A vector space is nothing more than a collection of vectors (whatever

    those now are) that satisfies a set of axioms.n Once we get the general definition of a vector and a vector space out of

    the way, well look at many of the important ideas that come with vectorspaces.

    11

    Vector Space (1)Vector Space (1)

    n Definition: Let V be a set on which addition and scalar multiplicationare defined (this means that if and are objects in , and is a scalarthen weve defined and in some way). If the following axioms aretrue for all objects and in and all scalars and , then iscalled a vector space and the objects in are called vectors.(a) (closure under addition) is in .

    (b) (closure under scalar multiplication) is in V.(c) (commutation with addition)

    Digital Media Lab.

    (d) (association)(e) There is a special object in V, denoted 0 and called the zero vector, such

    that for all u in V we have .(f) For every u in V there is another object in V, denoted u and called the

    negative of u, such that .(g) (distribution)(h)(i)

    (j)

    n A vector space is simply a collection of vectors satisfying the axioms above.

    12

  • 7/30/2019 Data Compressing

    14/163

    Vector Space (2)Vector Space (2)

    n Noten No need to be locked into the standard ways of defining addition and

    scalar multiplication. For the most part we will be doing addition andscalar multiplication in a fairly standard way, but there will be the

    occasional example where we wont.n In order for something to be a vector space it simply must have an

    addition and scalar multiplication that meets the above axioms and itdoesnt matter how strange the addition or scalar multiplication might be.

    Digital Media Lab.

    n When the scalar in the definition is complex numbers, it is calledcomplex vector space. In the same way, when we restrict the scalars toreal numbers we generally call the vector space a real vector space.

    n Ex1: If n is any positive integer then the set V = with the standardaddition and scalar multiplication as defined in the Euclidean n-space

    section is a vector space.

    n Ex2: Show a set V = with the standard scalar multiplication and anaddition defined as, is not a VS.

    13

    n

    n

    1 2 1 2 1 1 2 2( , ) ( , ) ( 2 , )u u v v u v u v+ = + +

    RfRf: Signal and Vector Space: Signal and Vector Space

    n Many natural and man-made system can be modeled well as linear.

    n Model such a linear structure using a linear model by treating signalas a vector in a vector space.n The vector space model can capture the linear structure well.n This modeling allows us to apply intuitions and tools from the geometry in

    3-space such as length, distance, angles, etc.n This is useful when the signal lives in high-dimensional or infinite-

    Digital Media Lab.

    dimensional spaces.

    14

  • 7/30/2019 Data Compressing

    15/163

    Inner ProductInner Product

    n Generalization of the concept of dot product (or inner product) in n-space for the vector space:

    Digital Media Lab. 15

    Norm in Vector SpaceNorm in Vector Space

    n A norm is a function that assigns a strictly positive length or size toall vectors in a vector space, other than the zero vector (which haszero length assigned to it).n A simple example is the 2-dimensional Euclidean space R

    2equipped with

    the Euclidean norm. The Euclidean norm assigns to each vector thelength of the vector. Because of this, the Euclidean norm is often known

    as the magnitude.n A vector space with a norm is called a normed vector space.

    Digital Media Lab.

    n Definition of Norm: Given a vector space Vover a subfield Fof thecomplex numbers, a norm on V is a function p: V Rwith thefollowing properties: For all a Fand all u, v V,n P1: p(av) = |a|p(v), (positive homogeneityorpositive scalability).n P2: p(u + v) p(u) +p(v) (triangle inequality).n P3: Ifp(v) = 0, then v is the zero vector (separates points).

    n A simple consequence of the first two axioms, positive homogeneityand the triangle inequality, isp(0) = 0 and thus,p(v) 0 (positivity).

    16

  • 7/30/2019 Data Compressing

    16/163

  • 7/30/2019 Data Compressing

    17/163

    Examples of Norm (2)Examples of Norm (2)

    n Zero norm: In signal processing and statistics, David Donohoreferred to the zero "norm" with quotation marks.

    n supp(x): support of x (set of index indicating non-zero components of x)

    n Following Donoho's notation, the zero "norm" ofx is simply the

    { }0

    supp( ) supp( ) 0iu u where u i u= =

    Digital Media Lab.

    number of non-zero coordinates ofx, or the Hamming distance of thevector from zero.n When this "norm" is localized to a bounded set, it is the limit ofp-norms

    asp approaches 0.n Of course, the zero "norm" is not a B-norm, because it is not positive

    homogeneous. It is not even an F-norm, because it is discontinuous,

    jointly and severally, with respect to the scalar argument in scalar-vectormultiplication and with respect to its vector argument.

    n Abusing terminology, some engineers omit Donoho's quotation marksand inappropriately call the number-of-nonzeros function the L0 norm(sic.), also misusing the notation for the Lebesgue space ofmeasurablefunctions.

    19http://en.wikipedia.org/wiki/Norm_(mathematics)

    Examples of Norm (3)Examples of Norm (3)

    n P-norm (forp 1, a real number)

    n Note that forp = 1, we get the taxicab norm, forp = 2 we get the Euclideannorm, and asp approaches the infinity, thep-norm approaches the infinity

    norm or maximum norm.n This definition is still of some interest for 0

  • 7/30/2019 Data Compressing

    18/163

    Examples of Norm (4)Examples of Norm (4)

    n Lp norm: For

    1/

    1

    1,..,

    , [1, )

    ,max

    pn

    p

    i

    ip

    i

    i n

    u p

    uu p

    =

    =

    = =

    [1, ]p

    Digital Media Lab. 21

    Properties of NormsProperties of Norms

    n The concept of unit circle (the set of all vectors of norm 1) is differentin different normsn For the 1-norm the unit circle in R2 is a squaren For the 2-norm (Euclidean norm) it is the well-known unit circlen For the infinity norm it is a different square.n For anyp-norm it is a superellipse (with congruent axes).n Due to the definition of the norm, the unit circle is always convex and

    centrally symmetric (therefore, the unit ball may be a rectangle but

    Digital Media Lab.

    cannot be a triangle).

    n Illustration of unit circles in different norms

    22

  • 7/30/2019 Data Compressing

    19/163

    Digital Media Lab.

    2.2 BASES AND FRAMES2.2 BASES AND FRAMES

    23

    Linear IndependenceLinear Independence

    n Linear Independence

    Digital Media Lab.

    n A finite set of vectors that contains the zero vector will be linearlydependent.

    n Suppose that is a set of vectors in Rn . If k > n, thenthe set of vectors is linearly dependent.

    24

    { }1,..., kS v v=

  • 7/30/2019 Data Compressing

    20/163

  • 7/30/2019 Data Compressing

    21/163

    OrthogonalityOrthogonality and Basis (2)and Basis (2)

    n Any vector in an inner product space, with an orthogonal/orthonormalbasis can be easily represented as a linear combination of basisvectors for that vector.

    Digital Media Lab. 27

    Orthogonal Complement (1)Orthogonal Complement (1)

    n Definition ofOrthogonal complement: Suppose that W is a subspaceof an inner product space V. We say that a vector u from V isorthogonal to W if it is orthogonal to every vector in W. The set of allvectors that are orthogonal to W is called the orthogonalcomplement of Wand is denoted by .n We say that W and are orthogonal complements.

    n Theorem

    W^

    W

    Digital Media Lab. 28

  • 7/30/2019 Data Compressing

    22/163

    Orthogonal Complement (2)Orthogonal Complement (2)

    n Extension of Projection

    Digital Media Lab.

    n Theorem

    29

    In Matrix FormIn Matrix Form

    n A basis set , any vector x in Rn

    is uniquely representedas,

    n Form a nxn matrix with columns given by s, and let c denotethe length-n vector with entries ci, the matrix representation is:

    { }1

    n n

    i if

    =

    1

    n

    i i

    i

    x c f=

    =

    F if

    Digital Media Lab.

    n Orthonormal basis should satisfy,n Therefore,

    n In matrix form, (note orthonormality means ).

    30

    x c= F

    , ( )i j i jf f d< >= -

    ,i jc x f=< >

    Tc x= F

    TIF F =

  • 7/30/2019 Data Compressing

    23/163

    DictionaryDictionary

    n A dictionary in is a collection of unit-normvectors:

    n Each elements are called atoms.

    n If spans , the dictionary is complete.

    n If are linearly dependent, the dictionary is redundant.

    nin RF { }1

    N n

    i ij

    =

    21ij =

    { } 1N n

    i iRj

    =

    { }1i i

    j=

    Digital Media Lab.

    n In the sparse approximation literature, it is also common for a basis orframe to be referred to as a dictionaryorover-complete dictionaryrespectively, with the dictionary elements being called atoms.

    31

    Digital Media Lab.

    2.3 SPARSE REPRESENTATION2.3 SPARSE REPRESENTATION

    32

  • 7/30/2019 Data Compressing

    24/163

    KK--SparseSparse

    n Definition ofK-sparse: A signal is called K-sparse if it has at most Knon-zero components, i.e.,

    n Note that a signalxitself may not show K-sparse, we still refer to x asbeing K-sparse, with understanding thatxcan be expressed K-sparsethrough linear transformation.

    0x K

    Digital Media Lab.

    n Ex: x(t) = cos(wt)n Time-domainn Fourier-domain

    n DCT-domain

    33

    0w erea a=

    Ex: Sparse representation of imagesEx: Sparse representation of images

    n Sparse representation of an image via multiscale wavelet transformn Note that most of wavelet coefficients are close to zero.

    Digital Media Lab. 34

    (a). Original image

    (b). Wavelet representation (larger coeff lighter pixel)

    Fog. 1.3 (Compressed Sensing by Y. Eldar et. Al)

  • 7/30/2019 Data Compressing

    25/163

    Ex: SparseEx: Sparse approx. ofapprox. of a natural imagea natural image

    n Sparse approximation of a natural image

    Digital Media Lab. 35

    (b) approximated by taking only 10% largest wavelet coefficient(a) Original image

    Fog. 1.4 (Compressed Sensing by Y. Eldar et. Al)

    Set of KSet of K--Sparse SignalsSparse Signals

    n Set of all K-sparse signals

    n Q: Is the set Sk a linear space?n That is, for any pair of vector x, z in Sk , x+z also belongs to Sk ?

    { }0K x x KS =

    Digital Media Lab.

    n See Fig. 1.5.

    36Fog. 1.5 (Compressed Sensing by Y. Eldar et. Al)

  • 7/30/2019 Data Compressing

    26/163

    Sparseness of ImageSparseness of Image

    n Most natural images are characterized by large smooth or texturedregions with relatively few sharp edges.n Signals with this structure are known to be very nearly sparse when

    represented using a multiscale wavelet approximation.

    n K-term approximationn Need measure (i.e., appropriate norm) to measure the error.

    n This kind of approximation is non-linear (since choice of which coefficients to

    keep in the approximation depends on signal itself).

    Digital Media Lab. 37

    Digital Media Lab.

    2.4 COMPRESSIBLE SIGNALS2.4 COMPRESSIBLE SIGNALS

    38

  • 7/30/2019 Data Compressing

    27/163

    Compressible vs. SparseCompressible vs. Sparse

    n Few real-world signals are truly sparse, rather they are compressible(meaning that they can be well-approximated by sparse signals).n The terms mean the same concept: compressible, approximately sparse,

    relatively sparse

    n Quantification of the compressibility by calculating the error incurredby approximating a signal x by some ,

    Kx S

    Digital Media Lab.

    n If , then for any p.

    n Thresholding (keeping only the K largest coefficients) gives the optimal

    approximation for all p.

    n Choose a basis set such that the coefficients obey the power-law decay.

    n

    39

    argm nK

    K px

    x xs S

    = -

    Kx S ( ) 0K pxs =

    Compressibility (1)Compressibility (1)

    n Definition of compressibility: A signal is called compressible if itssorted coefficient magnitudes in decays rapidly.

    n Power-law decay: suppose there exist C1 and q > 0 such that

    Y

    1 2 ... nx wherea a a a = Y

    sa , 1, 2,...qC s sa - =

    Digital Media Lab.

    n Larger q means faster magnitude decay, and the more compressible asignal is.

    n In power-law decay, a signal can be approximated pretty well for K

  • 7/30/2019 Data Compressing

    28/163

    Compressibility (2)Compressibility (2)

    n Depending on the space (referred by ), the signal can be eithercompressible or not.n Therefore proper choice of the space is important

    n Q: For a such compressible signal (K-approximated), there existconstant C2 and r > 0 depending only on C1 and q such that,

    r-

    Y

    Digital Media Lab. 41

    2 2K

    KK--term Approximationterm Approximation

    n Only K-largest coefficients are kept while making the others zero torepresent the given signal.n K-term approximation error

    2( ) argmin

    K

    K x xa

    s a S

    = - Y

    Digital Media Lab. 42

  • 7/30/2019 Data Compressing

    29/163

    Digital Media Lab.

    -- end --

    43

  • 7/30/2019 Data Compressing

    30/163

    Data CompressionData Compression

    (ECE 5546(ECE 5546--41)41)

    2012 Fall

    Digital Media Lab.

    Ch 3. Sensing Matrices

    Byeungwoo JeonDigital Media Lab, SKKU, Korea

    http://media.skku.ac.kr; [email protected]

    Digital Media Lab.

    What we are doing?

  • 7/30/2019 Data Compressing

    31/163

    Sparse vs. Compressible (1)Sparse vs. Compressible (1)

    n Recall that we call a signalxis K-sparse if it has at most k non-zeros.n K-sparse signal may not be themselves sparse, but it admits a sparse

    representation in some basis

    n But, few real-world signals are truly sparsen Most of signals can be represented as the compressible signal

    0

    { : }K

    x KS = 0

    supp( )

    lim

    p

    pp

    x x->

    =

    Digital Media Lab.

    x s= Y

    =

    Example: K= 4

    Y sx

    Sparse vs. Compressible (2)Sparse vs. Compressible (2)n Compressible signal means that the vector of coefficients in certain

    basis has few large coefficients and other coefficients with smallvalues.,n If we set small coefficients to zero, the remaining large coefficients can

    represent the original signal with hardly noticeable perpetual loss

    sparse compressiblex

    Digital Media Lab.

  • 7/30/2019 Data Compressing

    32/163

    Compressed Sensing (1)Compressed Sensing (1)

    n Compressed sensing measurement processn y: measurement vector (Mx1)n F: measurement (sensing) matrix (MxN)n x: input signal vector in its original domain (ex: time or spatial) (Nx1)n

    (CS is also possible for continuous-time signal)

    x= F

    Digital Media Lab.

    n F represents a dimensionality reduction (maps RN into RM, M

  • 7/30/2019 Data Compressing

    33/163

    Compressed Sensing (3)Compressed Sensing (3)

    n Measurement process withn There are small number of columns corresponding to nonzero coeffs.n The measurement vector y is a linear combination of these columns

    n

    Ex: The following is an underdetermined systemn 4-sparse, N unknownsn Fewer equations than unknowns (M

  • 7/30/2019 Data Compressing

    34/163

    Questions to Answer (2)Questions to Answer (2)

    n We further need to learn many issues:

    n Q1: How to design an MxN sensing matrixF to ensure that itpreserves the information in the signal x?n

    Sensing matrix which is designed to reduce the number of measurementsas much as possible while allowing the recovery wide class of signal xfrom their measurement

    n Sensing matrix design problem

    Digital Media Lab.

    n Q2: How small Mcan we choose given K and N?

    n Q3: How sparse the signal has to be at a given M?

    n Q4: How to recover the original signalxfrom measurements y?n Look for fast and robust algorithmsSignal recovery problem

    n Q5: When will the L1 convex relaxation solution attain the L0 solution?

    Questions to Answer (3)Questions to Answer (3)n After answering the previous questions, we further need to investigate

    yet another issues:

    n How to incorporate measurement noise in the signal model y = Fx ?n What would happen to the L1 minimization signal recovery?n Reliability issue in signal recovery

    n What would happen if there is a model mismatch ?

    Digital Media Lab.

    n If the signal is not exactly a K-sparse signal, what kind of results do weexpect under such an assumption?

  • 7/30/2019 Data Compressing

    35/163

    Digital Media Lab.

    Design of Sensing Matrix

    1. Null Space Property2. Restricted Isometry property

    3. Bounded Coherence Property

    1. Null Space Conditions1. Null Space Conditionsn Q: Like to design F so that we can find recover all sparse signal x

    corresponding to the measurement y. What condition on F do we need?

    n Definition of Null space ofF

    n Uniqueness solution condition fory= Fx

    ( ) { 0}N z zF = F =

    ( ) contains uniquely representsF F

    Digital Media Lab.

    n Proof:

    2K Kno vector in all xS S

    X y

    .

    .

    .

    .

    ..

    .

    .

    .

    .

    In this case, no way to find all signals x

    from the measurements y distinct x

    must mean distinct measurement vectors.

  • 7/30/2019 Data Compressing

    36/163

    SparkSpark

    n Definition of Spark: The spark of a given matrix F is the smallestnumber of columns ofF that are linearly dependent.

    n Sparkn

    A term coined by Donoho & Elad (2003)n It is a way characterizing the null space ofF using L-0 norm.n It is very complex to obtain (compared to a rank), since it calls for

    combinatorial search over all possible subsets of columns from F.

    Digital Media Lab.

    More on Spark (1)More on Spark (1)n Solving a underdetermined equation: (F ~ MxN, N >> M)

    n Term A: Its corresponding columns ofF are linearly dependentn Term B: Its corresponding columns ofF are linearly independent

    i i k k j k

    i k j

    y x x x xf f f= F = = + A: linearly dep. B:linearly indep.

    Digital Media Lab.

    1

    2

    1

    1 2

    .

    ...... .....

    .

    .

    .

    N

    M

    N

    x

    x

    y

    y

    x

    f f f

    =

    M

  • 7/30/2019 Data Compressing

    37/163

    More on Spark (2)More on Spark (2)

    n Note that any vector x can be represented asn Where

    1 1

    2 2

    0x x

    x x

    MM MM

    M MM

    ( )A

    x NS F

    Digital Media Lab.

    n Note thatn (zero vector is not considered since it is a trivial case )

    1 1

    0

    0

    .

    0

    A B

    i i

    i i

    N N

    x x xx x

    x x

    x x

    + +

    = = + = +

    M M

    2 ( ) 1spark M F +

    Spark ConditionSpark Conditionn Theorem: for any vector ,

    n (This is an equivalent way of characterizing the null space condition)

    y R

    ( )at most one signal

    2s.t. y=K

    spark Kx x

    $F >

    S F

    Digital Media Lab.

    n This theorem guarantees uniqueness of representation for K-sparsesignals.n It has combinatorial computational complexity, since it must verify that all

    sets of columns of a certain size are linearly independent.

    n (Proved by D.L.Donoho and M.Elad Optimally sparse representationin general (nonorthogonal) dictionaries via l1 minimization)

    n The spark provides a complete characterization of when sparserecovery is possible. However, when dealing with approximatelysparse signals (ie. compressible signal), we must consider somewhatmore restrictive conditions on the null space ofF.

  • 7/30/2019 Data Compressing

    38/163

    Proof to Spark ConditionProof to Spark Condition

    n () Suppose spark(F) 2Kn Assume that for some y there exists such that

    n Letting , we can write this asn Since , all sets of up to 2K columns ofF are linearly

    independent, and therefore .

    , Kx x S

    h x x= - 0hF =( ) 2spark KF >

    0h =x=

    y x x= F = F( ) 0x xF - =

    Corollary to Spark ConditionCorollary to Spark Conditionn Corollary:

    n Proof:2 ( ) 1; ( ) 2

    2 1

    2

    spark M spark K

    K M

    K M

    F + F >

    < +

    2 M

    Digital Media Lab.

  • 7/30/2019 Data Compressing

    39/163

    More on Spark ConditionMore on Spark Condition

    n The spark provides a complete characterization of when sparserecovery is possible.

    n However, when dealing with approximately sparse signals (ie.

    ( )at most one signal

    2s.t. y=K

    spark Kx x

    $F >

    S F

    Digital Media Lab.

    compressible signal), we must consider somewhat more restrictiveconditions on the null space ofF.n We must also ensure that NS(F) does not contain any vectors that are

    too compressible in addition to vectors that are sparse. Null space property

    n Notationn : length-N vector obtained by setting 0 the entries of x indicated by

    n : MxN matrix obtained by replacing zero column at the columnpositions indexed by

    {1, 2,..., }subset of indices NL {1, 2,..., } \C NL LxL

    C

    LFCL

    2. Null Space Property (NSP)2. Null Space Property (NSP)n Definition of null space property (NSP) of order K

    n A matrix F satisfies the null space property (NSP) of order Kif thereexists a constant C > 0 such that,

    holds for all and for all such that .

    1

    2

    Chh C

    LL

    h N F L KL

    Digital Media Lab.

    n Rf:

    1

    2

    3

    n

    h

    h

    hh

    h

    =

    M

    M

    1

    3

    0

    ,0

    0

    0

    h

    hhL

    =

    2

    4

    0

    0C

    n

    h

    hh

    h

    L

    =

    M

    Ch h hL L= +

    {1,3}L

  • 7/30/2019 Data Compressing

    40/163

    Null Space Property (NSP)Null Space Property (NSP)

    n The NSP implies that vectors in the null space ofF should not be tooconcentrated on a small subset of indices.

    n If a vector h is exactly K-sparse, then there exists a such that

    1

    2

    Ch

    h C K

    L

    L

    L 0C =

    Digital Media Lab.

    Therefore, NSP indicates that as well.

    n This means that if a matrix F satisfies the NSP, then the only K-sparsevector in N(F) is h=0.

    20, 0h thus hL L= =

    NSP and Sparse RecoveryNSP and Sparse Recoveryn How to measure the performance of sparse recovery algorithms when

    dealing with general non-sparse x.

    n The following relationship under NSP guarantees exact recovery of allpossible K-sparse signals, but also ensures a degree of robustness tonon-sparse signals that directly depends on how well the signals are

    approximated by K-sparse vectors.s

    Digital Media Lab.

    n Where represents a specific recovery method, and

    1

    2

    Chh C

    LL

    :M ND R R

    2x

    K-

    ( ) min

    K

    K p pxx x xs

    S= -

  • 7/30/2019 Data Compressing

    41/163

    NSP TheoremNSP Theorem

    n Theorem: For a sensing matrix F : RN RM, and an arbitraryrecovery algorithm D : RM RN

    1

    2

    a pair ( , ) satisfiessatisfies

    ( )( ) 2K xx x C NSP of order KK

    s

    F DF

    D F -

    Digital Media Lab.

    Proof of NSP TheoremProof of NSP Theorem

    n Suppose and let be the indices corresponding to the 2K

    largest entries of h. Split into and where

    n Set and , so that .

    n Since by construction , we can apply to

    obtain . Moreover, since , we have

    ( )h N F LL 0L 1L 0 1 KL = L =

    1Cx h hL= +

    0x h = - h x x= -

    Kx S1

    2

    ( )( ) Kx x C

    K

    sD F -

    ( )x x = D F ( )h N F

    Digital Media Lab.

    n so that . Thus, . Finally, we have that

    n If matrix F satisfies the NSP then the only 2K-sparse vector in N(F )is h=0

    x x= - =

    xF = F ( )x x = D F

    1 1

    2 2 2 2

    ( )( ) 2

    2

    CK

    hxh h x x x x C C

    K K

    s LL = - = - D F =

  • 7/30/2019 Data Compressing

    42/163

    RestrictedRestricted IsometryIsometry Property (RIP)Property (RIP)

    n When measurements are contaminated with noise or have beencorrupted by some error such as quantization, it will be useful toconsider somewhat stronger conditions.

    n

    Candes and Tao introduced the isometry condition on matrices A andestablished its important role in CS.

    n In mathematics, an isometry is a distance-preserving map between

    Digital Media Lab.

    metric spaces. Geometric figures which can be related by an isometryare called congruent.

    RestrictedRestricted IsometryIsometry Property (RIP)Property (RIP)

    n Definition of RIPn A matrix F satisfies the restricted isometry property(RIP) of order Kif

    there exists a such that

    holds for all .

    (0,1)Kd 2 2 2

    2 2 2(1 ) (1 )K Kx xd d- F +

    { }0|Kx x x KS =

    Digital Media Lab.

    n If a matrix F satisfies the RIP of order 2K, F approximately preservesthe distance between any pair of K-sparse vectors.n Fundamental implications concerning robustness to noise.

    n If a matrix F satisfies the RIP of order K with constant dK, then, forany K < K, we automatically have that F satisfies the RIP of order Kwith constant .

    n If a matrix F satisfies the RIP of order K with a sufficiently smallconstant, then it will also automatically satisfy the RIP of ordergKforcertain g, albeit with a somewhat worse constant.

    'K Kd d

  • 7/30/2019 Data Compressing

    43/163

    The RIP and StabilityThe RIP and Stability

    n Definition of C-stable: Let denote a sensing matrix anddenote a recovery algorithm. A pair (F,D) is called C-

    stable if for any and any , we have that

    n It says that if we add a small amount of noise to the measurements,then the impact of this on the recovered signal should notbe

    : ND R RKx S e

    2 2

    ( )x e x C eD F + -

    : N MF R R

    Digital Media Lab.

    arbitrarily large.

    n As C 1, F must satisfy the lower bound of below with

    n Thus if we desire to reduce the impact of noise in the recovered signal,we must adjust F so that it satisfies the lower bound of above inequalitywith a tighter constant.

    2 2 2

    2 2 2(1 ) (1 )

    K Kx xd d- F +

    21 1/ 0K

    Cd = -

    The RIP and StabilityThe RIP and Stabilityn Theorem: If a pair (F,D) is C-stable, then, for all

    n It demonstrates that the existence of any decoding algorithm that can

    stably recover from noisy measurements requires that F satisfy thelower bound of RIP with a constant determined by C.

    2 2

    1x x

    C F

    Kx S

    Digital Media Lab.

  • 7/30/2019 Data Compressing

    44/163

    Digital Media Lab.

    End of Lecture 3

    (Chapter 3 is continued in next week)

  • 7/30/2019 Data Compressing

    45/163

    Data CompressionData Compression

    (ECE 5546(ECE 5546--41)41)

    2012 Fall

    Digital Media Lab.

    Ch 3. Sensing Matrices

    Byeungwoo JeonDigital Media Lab, SKKU, Korea

    http://media.skku.ac.kr; [email protected]

    Digital Media Lab.

    How many measurement are necessary to

    achieve RIP?

    (Measurement Bound)

    2

  • 7/30/2019 Data Compressing

    46/163

    Measurement bound (1)Measurement bound (1)

    n Lemma: For K and M satisfying K < N/2, there exist a subset X of SKsuch that for any x in X, we have,

    and for any distinct x & z in X,

    2x K

    2/ 2 log log

    2

    K Nx z K and X

    -

    Digital Media Lab.

    n Proof:

    3

    Measurement bound (2)Measurement bound (2)n Theorem: Let F be an NxM matrix that satisfies the RIP of order 2K

    with constant . Then,

    n Proof:

    (0,0.5]d

    ( )1

    log log 24 1 0.282

    NM CK where C

    K

    = +

    Digital Media Lab. 4

  • 7/30/2019 Data Compressing

    47/163

    Measurement bound (3)Measurement bound (3)

    n Johnson-Lindenstrauss lemma:

    ( )002

    log0

    c pM where constant c

    e >

    Digital Media Lab. 5

    Digital Media Lab.

    How to design the sensing matrix?

    6

  • 7/30/2019 Data Compressing

    48/163

    RIP and NSP (1)RIP and NSP (1)

    n Theorem: Suppose F satisfies the RIP of order 2K with .Then, F satisfies the NSP of order 2K with constant,

    ( )2 2 1Kd < -

    ( )

    2

    2

    2

    1 1 2

    K

    K

    Cd

    d

    =

    - +

    Digital Media Lab.

    RIP and NSP (2)RIP and NSP (2)n Lemma: Suppose , then,

    n Lemma : Suppose that F satisfies the RIP of order 2K, and let

    be arbitrary. Let be any subset of {1,2,,N} s.t.Define as the index set corresponding to the K

    Ku S 1

    2

    uu K u

    K

    , 0N

    h R h 00 1

    .KL L

    Digital Media Lab.

    entries of with largest magnitude, and set . Then,

    where,

    8

    0c 0 1

    L = L L

    0 1

    2

    2

    ,ch h hh

    hKa b

    L L

    LL

    F F +

    2

    2 2

    2 1,

    1 1

    K

    K K

    da b

    d d= =

    - -

  • 7/30/2019 Data Compressing

    49/163

    Matrix Design Satisfying RIPMatrix Design Satisfying RIP

    n Q: How to construct matrix satisfying RIP?

    n Methods1. Deterministic method2. Randomization method

    n Method without specified d2K

    (just assume d2K

    > 0)

    n Method with specified d2K

    ( particular value ofd2K

    is specified)

    Digital Media Lab.

    n Definition of RIPn A matrix F satisfies the restricted isometry property(RIP) of order Kif

    there exists a such that, for all ,

    n Theorem on RIP and NSP: Suppose F satisfies the RIP of order 2Kwith . Then, F satisfies the NSP of order 2K with constant,

    9

    2 2 2

    2 2 2(1 ) (1 )K Kx x xd d- F +

    { }0|K x x K S = (0,1)Kd

    ( )2 2 1Kd < -

    ( )2

    2

    2

    1 1 2

    K

    K

    Cd

    d=

    - +

    Deterministic Matrix DesignDeterministic Matrix Designn Idea: deterministically construct matrices of size MxN that satisfy the

    RIP of order Kn It requires M to be relatively large.

    n (ex). Requires

    n In real-world problem, these results lead to an unacceptably large M.( ) ( )2 og [62]; [115]M O K N in M O KN ina= =

    Digital Media Lab. 10

  • 7/30/2019 Data Compressing

    50/163

    Randomization Matrix Design (1)Randomization Matrix Design (1)

    n Idea: Choose random numbers for matrix entriesn For given M and N, generate random matrices F by choosing the entries

    fij as independent realizations from some PDF.

    n Randomization method without specified d2K

    (just assume d2K

    > 0)n Set M=2K, and draw F according to Gaussian PDF.

    n With probability 1, any subset of 2K columns are linearly indep. and hence all

    subsets of 2K columns will be bounded below by (1- d2K

    ), where d2K

    > 0.

    Digital Media Lab.

    n Problem: how to know the value ofd2K

    ?

    n Need to search all combinations of K-dimensional subspaces of .

    n Considering realistic values of N and K, such search is of prohibitively too

    much computation.

    11

    N

    R

    Randomization Matrix DesignRandomization Matrix Design (2)(2)

    n Randomization method with specified value ofd2K

    n Like to achieve RIP of order 2K for a specified constant d2K

    .n It can be achieved by specifying additional two conditions on the PDF.

    n Cond 1: Let the PDF yield a matrix that is norm-preserving, That is,

    n Under this condition, the variance of PDF is 1/M.

    ( )21

    ijE f =

    Digital Media Lab.

    n Cond 2: The PDF is sub-Gaussian. That is, there exists a constant c > 0 s.t.

    n Note that, the moment-generating function of the PDF is dominated by

    that of a Gaussian PDF, which is also equivalent to requiring that the

    tails of the PDF decay at least as fast as the tails of a Gaussian PDF.

    12

    ( )2 2 / 2ij t c tE e e for all t R

    f

  • 7/30/2019 Data Compressing

    51/163

    Randomization Matrix DesignRandomization Matrix Design (3)(3)

    n Examples of sub-Gaussian PDFn Gaussian, Bernoulli with taking values , more generally any PDF

    with bounded support.

    n Strictly-Sub-Gaussian: a PDF satisfyingwith a constant c below.

    1

    ( )2 2 /2

    .

    ij t c t

    E e e for all t R

    f

    = ( )2 2

    1ij

    c E f= =

    Digital Media Lab.

    n Corollary: suppose that F is an MxN matrix whose entries fij are iidwith fij drawn according to a strictly sub-Gaussian PDF with c

    2=1/M.Let Y = Fx for x in RN. Then for any e > 0 and any x in RN,

    With

    n Note that the norm of a sub-Gaussian RV strongly concentrates about itsmean.

    13

    ( ) ( )2

    2 2 2 2 2

    *2 2 2 2 2& P 2 exp

    ME Y x Y x x

    ee

    k

    = - -

    * 2 6.52

    1 log(2)k =

    -

    Randomization Matrix DesignRandomization Matrix Design (4)(4)

    n Theorem: Fix . Let F be an MxN random matrix whoseentries fij drawn according to a strictly-Gaussian PDF with c

    2=1/M.If,

    then F satisfies the RIP of order K with prescribed d with probabilityexceeding , where k1 is arbitrary and

    1 logN

    M KK

    k =

    (0,1)d

    21 2M

    ek-- ( )2 *2 12 log 42 / .ek d k d k = -

    Digital Media Lab.

    n Note that the measurement bound above satisfies the optimalnumber of measurements (up to a constant).

    14

  • 7/30/2019 Data Compressing

    52/163

    Why randomized method better?Why randomized method better?

    n One can show that for the random construction, the measurementsare democratic.n It means that it is possible to recover a signal using any sufficiently large

    subset of the measurements.n Thus, by using random F one can be robust to the loss or corruption of a

    small fraction of the measurements.

    n Universality: can easily accommodate some other basis

    Digital Media Lab.

    n In practice, we are often more interested in the setting where x is sparsewith respect to some basis Y. In this case, what actually required is RIP ofthe product (FY).n In case deterministic design, the design process must take into account Y.n In randomized design, F can be designed independently from Y.

    n IfF is Gaussian and Y is orthonormal, note that (FY) is also Gaussian.n Furthermore, for sufficiently large M, (FY) will satisfy RIP with high

    probability.

    15

    Practical SituationPractical Situation

    n In practical implementation, the fully random matrix design may besometimes impractical to build in HW. Therefore it is possible to:n Use a reduced amount of randomnessn Or model the architecture via matrices F that has significantly more

    structure than a fully random matrixn EX: random demodulator[192], random filtering [194], modulated

    wideband converter [147], random convolution [2,166], compressivemultiplier [179]

    Digital Media Lab.

    n Although not quite easy as in the fully random case, one can provethat many of such constructions also satisfy RIP.

    16

  • 7/30/2019 Data Compressing

    53/163

    CoherenceCoherence

    n Definition: Coherence of a matrix F, m(F), is the largest absoluteinner product between any two columns offi, fj ofF.

    12 2

    ,( ) max

    i j

    i j N i j

    f fm

    f f < F =

    Digital Media Lab.

    n Note that the coherence satisfies the relation below:

    n Its lower bound is called Welch bound

    n When N>>M, the lower bound is approximately to

    n Coherence is related to spark, NSP, and RIP.

    17

    ( ) 1( 1)

    N M

    M Nm

    - F

    -

    ( ) 1 /m F

    CoherenceCoherence and Sparkand Sparkn Theorem: The eigenvalues of an NxN matrix Mwith entries

    lie in the union of N discs centered at andwith radius .

    ,1 , ,ij

    m i j N

    ( ), ,1 ,i i i id d c r i N = i iic m=

    i ij

    j i

    r m

    =

    Digital Media Lab.

    n Lemma: For any matrix F,

    18

    1( ) 1

    ( )spark

    mF +

    F

  • 7/30/2019 Data Compressing

    54/163

    Coherence and NSPCoherence and NSP

    n Theorem: If , then for each measurement vector ,

    there exists at most one signal such that y=Fx.

    1 11

    2 ( )K

    m

    < + F

    My R

    Kx S

    Digital Media Lab.

    n Lemma: IfF has unit-norm columns and coherence m=m(F), then Fsatisfies the RIP of order K with d=(K-1)m for all K

  • 7/30/2019 Data Compressing

    55/163

    Data CompressionData Compression

    (ECE 5546(ECE 5546--41)41)

    2012 Fall

    Digital Media Lab.

    Ch 4. Sparse Signal Recovery via L1 Minimization

    Byeungwoo JeonDigital Media Lab, SKKU, Korea

    http://media.skku.ac.kr; [email protected]

    Digital Media Lab.

    How to recover a sparse signalfrom a small

    number of linear measurements ?

    2

  • 7/30/2019 Data Compressing

    56/163

    Sparse Signal Recovery (1)Sparse Signal Recovery (1)

    n Problem:Fory=Fx and forxassumed to be sparse (or compressible), findsatisfying

    where B(y) ensures that is consistent with the measurements y.

    0 argmin ( )

    z

    x z subject to z B y=

    x

    Digital Media Lab.

    n Under an assumption of x being sparse (or compressible), find xcorresponding to measurement y under L0 optimality condition.n The solution seeks for the sparsest signal in B(y).n Consistent with measurements: depending on existence of measurement

    noise, B(y) has two cases.

    3

    { }

    { }2

    | ( )( )

    | ( )

    z y noise free caseB y

    z z y noisy casee

    F = -=

    F -

    Sparse Signal Recovery (2)Sparse Signal Recovery (2)

    n The framework also holds for the case x is not apparently sparse.

    n In that case, suppose , then the problem is,

    where,

    x a= Y

    0 argmin ( )

    z

    z subject to z B ya =

    { }| ( )z z y noise free case FY = -

    Digital Media Lab.

    n Note that under an assumption of Y referring to orthonormal basis, itis possible to assume y=I without loss of generality.

    4

    { }2

    | ( )z z y noisy caseeFY -

  • 7/30/2019 Data Compressing

    57/163

    Sparse Signal Recovery (3)Sparse Signal Recovery (3)

    n How to solve the L0 minimization problem?

    0 argmin ( )

    z

    z subject to z B y=

    { }{ }

    2

    | ( )( )| ( )

    z y noise free caseB yz z y noisy casee

    FY = -= FY -

    Digital Media Lab.

    n Note that || . ||0

    is a non-convexfunction.n Potentially very complex (NP-hard) to solve this minimization problem.

    n L0solution via L1 minimization

    n If B(y) is convex, this problem becomes computationally tractable !n The solution prefers to a sparse solution in B(y).

    n Big Question: Will L1 solution be similar to L1 solution?

    5

    1 argmin ( )

    z

    z subject to z B y=

    Why LWhy L11 Minimization Preferred?Minimization Preferred?

    n Intuitively,n L1 minimization promotes sparsity.n There are variety of reasons to suspect that L1 minimization will provide

    an accurate method for sparse signal recovery.n L1 minimization provides a computationally tractable approach to the

    signal recovery.

    Digital Media Lab. 6

  • 7/30/2019 Data Compressing

    58/163

    Digital Media Lab.

    Analysis of L1 Minimization Solution

    7

    1 argmin ( )

    z

    x z subject to z B y=

    NoiseNoise--free Signal Recovery (1)free Signal Recovery (1)

    n Lemma: Let F be a matrix that satisfies the RIP of order 2K withconstant . Let be given, define ,L0 ~ index set corresponding to K entries of x w/ largest magnitudeL1 ~ index set corresponding to K entries of w/ largest magnitudeSet . If , then,

    10 12

    ,( )K

    h hxh C C

    s L< F F > +

    22 1

    Kd < - ,

    Nx x R h x x= -

    0 1L = L LU 0

    ch

    1 1x x

    Digital Media Lab.

    where,

    n It shows an error bound for the class of L1 minimization algorithm

    when combined with a measurement matrix F satisfying RIP.n For specific bounds for concrete examples of B(y), need to examine how

    requiring affects .

    8

    2L

    20 1

    2 2

    1 (1 2 ) 22 ,

    1 (1 2) 1 (1 2 )

    K

    K K

    C Cd

    d d

    - -= =

    - + - -

    ( )x B y ,h hL< F F >

  • 7/30/2019 Data Compressing

    59/163

    NoiseNoise--free Signal Recovery (2)free Signal Recovery (2)

    n Proof: (self-study)

    Digital Media Lab. 9

    NoiseNoise--free Signal Recovery (3)free Signal Recovery (3)

    n Theorem: Let F be a matrix that satisfies the RIP of order 2K withconstant . When B(y)={z | Fz = y }, the solution to theL1 minimization obeys that,

    102

    ( ) Kx x C

    K

    s-

    22 1

    Kd < -

    Digital Media Lab.

    n For and F satisfying RIP,

    n Note L1 minimization exactly provides the solution by L0 minimization.

    n In other words, for as few as O(Klog(N/K)) measurements, we can

    exactly recover any K-sparse signals using the L1 minimization.n This can be shown also stable under noisy measurements.

    10

    { }0|Kx x x K S =

    2 0x x- =

  • 7/30/2019 Data Compressing

    60/163

    NoiseNoise--free Signal Recovery (4)free Signal Recovery (4)

    n Proof: For x belonging to B(y), the lemma can be applied to obtainthat for ,

    Since , Therefore, Fh = 0, and the

    10 12

    2

    ,( )K

    h hxh C C

    hK

    s L

    L

    < F F > +

    , ( ), .x x B y y x x = F = F

    h x x= -

    Digital Media Lab.

    , .

    11

    102

    ( )Kh C

    K

    s

    --Q.E.D.---

    Noisy Signal Recovery (1)Noisy Signal Recovery (1)

    n Theorem: Let F be a matrix that satisfies the RIP of order 2K withconstant . Let y=Fx+e with (that is, boundednoise) . Then, for the L1 solution obeys

    10 2

    2

    ( ) K

    xx x C C

    K

    se- +

    22 1

    Kd < -

    2e e

    { }2

    ( ) | ,B y z z y e= F -

    Digital Media Lab.

    ,

    n This provides a bound on the worst-case performance for uniformly

    bounded noise.

    12

    ( ) ( )2 2

    0 2

    2 2

    1 1 2 1 22 , 4

    1 1 2 1 1 2

    K K

    K K

    C Cd d

    d d

    - - += =

    - + - +

  • 7/30/2019 Data Compressing

    61/163

    Noisy Signal Recovery (2)Noisy Signal Recovery (2)

    n Proof: (self-study)

    Digital Media Lab. 13

    Noisy Signal Recovery (3)Noisy Signal Recovery (3)

    n What is the bound of recovery error if the noise is Gaussian ?

    n Corollary: Let F be a sensing matrix that satisfies the RIP of order-

    2, ( , )My x e where e R iid with N o s= F +

    -

    Digital Media Lab.

    , ,obtain measurement y=Fx+e where the entries of e are iid N(0,s2).Then, when the solution to L1minimization obeys,

    14

    ( )0222

    1 8 1

    1 (1 2)

    c MK

    K

    x x M with probability at least ed

    sd

    -+- -- +

    { }2( ) | 2 ,B y z z y Ms= F -

  • 7/30/2019 Data Compressing

    62/163

    Digital Media Lab.

    How to recover a non-sparse signal from a

    small number of linear measurements ?

    15

    InstanceInstance--optimal Guarantee (1)optimal Guarantee (1)

    n Theorem: Let F be a MxN matrix and that is arecovery algorithm satisfying,

    then,

    ( ) 22 ( ) 1,Kx x C x for some Ks- D F

    : M NRD

    ( )2

    1 1 1 / C N> - -

    Digital Media Lab.

    n n or er o ma e e oun o or a s gna s x w a cons an ,then, regardless of what recovery algorithm is being used, need totake measurements.

    N

  • 7/30/2019 Data Compressing

    63/163

    RfRf: Instance: Instance--Optimal?Optimal?

    n The theorem says not only about exact recovery of all possible k-sparse signals, but also ensures a degree of robustness to non-sparse signals that directly depends on how well the signals areapproximated by k-sparse vectors.

    Instance-optimal guarantee (i.e., it guarantees optimal performance foreach instance of x)

    n Cf: Guarantee that onlyholds for some subsetof possible signals,

    Digital Media Lab.

    such as compressible or sparse signals (the quality of guaranteeadapts to the particular choice of x)n In that sense, instance-optimality is also commonly referred to as

    uniform guarantee since they hold uniformly for all x.

    17

    InstanceInstance--optimal Guarantee (2)optimal Guarantee (2)

    n Theorem: Fix . Let F be a MxN random matrix whoseentries are iid with drawn according to a strictly sub-Gaussiandistribution with If,

    then, F satisfies the RIP of order K with the prescribed d with

    (0,1)d

    1log

    NM K

    Kk

    ijf

    ijf

    21 / .c M=

    Digital Media Lab.

    pro a y excee ng , w ere s ar rary an21 2e-- 1k

    ( )2 *2 1/ 2 log 42 / /ek d k d k = -

  • 7/30/2019 Data Compressing

    64/163

    InstanceInstance--optimal Guarantee (3)optimal Guarantee (3)

    n Theorem: Let be fixed. Set , and suppose that Fbe a MxN sub-Gaussian random matrix with

    and measurement is y= Fx. Set . Then, with probabilityexceeding , when the L1

    NR

    ( )321 2 MMe e kk --- -

    22 1

    Kd < -

    1log

    NM K

    K

    k

    2 ( )

    Ke s=

    { }2

    ( ) | ,B y z z y e= F -

    Digital Media Lab.

    ,

    2 2

    22

    2

    1 (1 2 ) 8 ( )

    1 (1 2)

    K K

    K

    K

    x xd d

    sd

    + - +-

    - +

    Digital Media Lab.

    End of Chapter 4

    20

  • 7/30/2019 Data Compressing

    65/163

    Data CompressionData Compression(ECE 5546(ECE 5546--41)41)

    2012 Fall

    Digital Media Lab.

    Ch 5. Algorithms for Sparse Recovery

    Part 1

    Byeungwoo Jeon

    Digital Media Lab, SKKU, Korea

    http://media.skku.ac.kr; [email protected]

    Digital Media Lab.

    Various recovery algorithms for compressed-

    sensed sparse signal

    2

  • 7/30/2019 Data Compressing

    66/163

    Sparse Signal Recovery (1)Sparse Signal Recovery (1)

    n Problem:Fory=Fx and forxassumed to be sparse (or compressible), findsatisfying

    where B(y) ensures that is consistent with the measurements y.

    0 argmin ( )

    z

    z subject to z B y=

    x

    From Chapter 4

    Digital Media Lab.

    n Under an assumption of x being sparse (or compressible), find xcorresponding to measurement y under L0 optimality condition.n The solution seeks for the sparsest signal in B(y).n Consistent with measurements: depending on existence of measurement

    noise, B(y) has two cases.

    n Loss (cost) function other than the Euclidean distance may also beappropriate.

    3

    { }{ }

    2

    | ( )( )| ( )

    z y noise free caseB yz z y noisy casee

    F = -= F -

    Sparse Signal Recovery (2)Sparse Signal Recovery (2)

    n How to solve the L0 minimization problem?

    0 argmin ( )

    z

    z subject to z B y=

    { }

    { }2

    | ( )( )

    | ( )

    z z y noise free caseB y

    z z y noisy casee

    FY = -=

    FY -

    From Chapter 4

    Digital Media Lab.

    n Note that || . ||0

    is a non-convexfunction.n Potentially very complex (NP-hard) to solve this minimization problem.

    n L0solution via L1 minimization

    n

    If B(y) is convex, this problem becomes computationally tractable !n The solution prefers to a sparse solution in B(y).

    n Big Question: Will L1 solution be similar to L1 solution?

    4

    1 argmin ( )

    z

    x z subject to z B y=

  • 7/30/2019 Data Compressing

    67/163

    Use of Different NormsUse of Different Norms

    n Solve the underdetermined system

    n L2 norm (p=2): small penalty on small residual, strong penalty on

    ; ; ;MxN N M

    y x where R x R y R M N= F F 0 as:

    n The parameterm can be found by trial-and-error, or by statisticaltechnique such as cross-validation.

    n Actually decision of a proper value m is a research problem.

    9

    ( ){ }min ( ) ,x x H x ym+ F

    ConvexConvexoptimizationoptimization--based method (2)based method (2)

    n Ex: J(x) = ||x||p

    n p=0 (L0 norm): directly measure sparsity (but hard to solve)n p=1 (L1 norm): gives robustness against outliers

    n Ex: for example, p=2

    { }0

    min ( ) | minx x

    J x y x x subject to y x= F = F

    ( ),p

    x y x yF = F -

    Digital Media Lab.

    n Ex: the noisy case can be modified in several ways:

    10

    2 0min

    xx y subject to x KF -

    2

    2

    0

    1min , 0

    2xx y xm m

    F - + >

    Review

    Convexity,

    Optimization,

    etc.

    ( ){ }0 2

    min ( ) | , minx x

    J x H x y x subject to x ye eF F -

  • 7/30/2019 Data Compressing

    70/163

    ConvexConvexoptimizationoptimization--based method (3)based method (3)

    n Standard optimization package cannot be used for real applicationsof CS since the number of unknowns (that is, dimension of x) is verylarge.

    n If there are no restrictions on the sensing matrix F and the signal x,the solution to the sparse approximation is very complex (NP-hard).n In practice, sparse approximation algorithms tend to be slow unless the

    sensing matrix F admits a fast matrix-vector multiply (like fast transform

    Digital Media Lab.

    Need 1~2 volunteer to

    investigate on fast

    computation utliizing

    the structure of

    sensing matrix

    algorithm utilizing matrix structure).n In case of compressible signal which needs some transformation first,

    fast multiplication is possible when both the sensing (random) matrix andsparsity basis are structured.

    n Then, the question is how to incorporate more sophisticated signalconstraints into sparsity models.

    11

    L0 Approach (1)L0 Approach (1)

    n L0 norm explictly computes the number of nonzero components ofthe given datan Directly related to sparsity of a signaln A function card(x): cardinalityn For scalar x:

    n Card(x) has no convexity properties.

    ( ) 0 ( 0) 1 ( 0)card x x and x= =

    Digital Media Lab.

    n Note however it is quasi-concave on since, for x, y 0

    12

    nR+

    { }( ) min ( ), ( )card x y card x card y+

    From Prof. S. Boyd (EE364a, b) Stanford Univ

  • 7/30/2019 Data Compressing

    71/163

    RfRf:: QuasiconvexityQuasiconvexity (1)(1)

    n Quasiconvex function: a real-valued function defined on an intervalor on a convex subset of a real vector space such that the inverseimage of any set of the form (-infinity, a) is a convex set.n Informally, along any stretch of the curve, the highest point is one of the

    endpoints.n The negative of a quasiconvex function is said to be quasiconcave.

    Digital Media Lab. 13http://en.wikipedia.org/wiki/Quasiconvex_function

    A quasiconvex function that

    is not convex

    A function that is not quasiconvex: the set of

    points in the domain of the function for which

    the function values are below the dashed red

    line is the union of the two red intervals,

    which is not a convex set

    RfRf:: QuasiconvexityQuasiconvexity (2)(2)

    n Def: A function f : S R defined on a convex subset S of a real vectorspace is quasiconvexif for all x, y S and l [0,1], then,

    n Note that the pointsxand y, and the point directly between them, can be

    points on a line or more generally points in n-dimensional space.n In words, iffis such that it is always true that a point directly between two

    other oints does not ive a hi her a value of the function than do both of

    { } ( )(1 ) max ( ), ( )f x y f x f yl l+ -

    Digital Media Lab.

    the other points, then fis quasiconvex.

    n An alternative way of defining a quasi-convex function is to require thateach sub-levelset Sa(f) is a convex set.

    n A concave function can be quasiconvex function. For example log(x) isconcave, and it is quasiconvex.

    n Any monotonic function is both quasiconvex and quasiconcave. Moregenerally, a function which decreases up to a point and increases fromthat point on is quasiconvex (compare unimodality).

    14http://en.wikipedia.org/wiki/Quasiconvex_function

    { }( ) | ( ) ~aS f x f x a convex set =

  • 7/30/2019 Data Compressing

    72/163

    RfRf:: QuasiconvexityQuasiconvexity (3)(3)

    n Quasiconvexity is a generalization of convexity.n All convex functions are also quasiconvex, but not all quasiconvex

    functions are convex.

    n A function that is both quasiconvex and quasiconcave is quasilinear.

    Digital Media Lab. 15

    The probability density function of the normal

    distribution is quasiconcave but not concave

    http://en.wikipedia.org/wiki/Quasiconvex_function

    A quasilinear function is both

    quasiconvex and quasiconcave

    L0 Approach (2)L0 Approach (2)

    n General convex-cardinality problemn It refers to a problem what would be convex, except for appearance of

    card(.) in objective or constrains.n Example: For f, C: convex,

    ( )inimize card x subject to x C

    ( ) , ( )inimize f x subject to x C card x K

    Digital Media Lab.

    n Solving convex-cardinality problem: for x Rn,

    n Fix a sparsity pattern of x (i.e., which entries are zero/nonzero), thensolve its convex problem

    n If we solve the 2n

    convex problems associated with all possible sparsitypatterns, the convex-cardinality problem is solved completely.n

    However, practically possible only for n 10n General convex-cardinality problem is NP-hard.

    16From Prof. S. Boyd (EE364a, b) Stanford Univ

  • 7/30/2019 Data Compressing

    73/163

    L0 Approach (3)L0 Approach (3)

    n Many forms of optimization problems

    2 0inimize x y subject to x K F -

    0 2Minimize x subject to x y eF -

    inimize x y xlF - +

    Digital Media Lab.

    n L1-norm Heuristicn Replace ||x||0 with l||x||1 or add regularization term l||x||1 to objective fct.n l is a parameter used to achieve desired sparsity

    n More sophisticated versions use or wherew and v are positive weights.

    17From Prof. S. Boyd (EE364a, b) Stanford Univ

    ( ) ( )i i i i i ii i i

    w x w x v x+ -

    +

    RfRf: Reweighted L1 algorithm (1): Reweighted L1 algorithm (1)

    n (joint work of E. Candes, M. Wakins and S. Boyd)

    n Minimum L0 recovery requires minimal oversampling but intractable

    n Observation: If x* solution to the combinatorial search and

    { 0}01

    ix

    i

    min x subject to y x= = F

    Digital Media Lab.

    then, x* is also the solution to

    18

    *

    **

    *

    10

    0

    i

    ii

    i

    xxw

    x

    =

    =*

    i i

    i

    min w x subject to y x= F

    From CS Theory Lecture Notes by E. Candes, 2007

  • 7/30/2019 Data Compressing

    74/163

    RfRf: Reweighted L1 algorithm (2): Reweighted L1 algorithm (2)

    Initial step:

    Loop: For j=1,2,3,n Solve

    n Update

    (0 ) 1i

    w for all i=

    ( ) ( 1)

    arg min | |j j

    i i

    ix w x such that y x

    - = = F

    ( ) 1j =

    Digital Media Lab.

    n Until convergence (typically 2~5 iterations)

    n

    Intuition: down-weight large entries of x to mimic magnitude-insensitive L0-penalty.

    19

    ( )| |ix e+

    From CS Theory Lecture Notes by E. Candes, 2007

    RfRf: Reweighted L1 algorithm (3): Reweighted L1 algorithm (3)

    n Empirical performance

    Digital Media Lab. 20From CS Theory Lecture Notes by E. Candes, 2007

  • 7/30/2019 Data Compressing

    75/163

    L1 Approach (1)L1 Approach (1)

    n Connection between L1 norm and sparsityn Known for a long time, early 70sn Mainly studied in Geophysics (literature on sparse spike trains)n Key rough empirical fact is that L1 returns sparse solution

    n Replace the combinatorial L0 function with the L1 norm, yielding aconvex optimization problemn It makes the problem tractable !

    Digital Media Lab.

    n here can be several variants of the problem.

    21From CS Theory Lecture Notes by E. Candes, 2007

    L1 Approach (2)L1 Approach (2)

    n The L1 minimization problem

    n There is always a solution with at mostM non-zero termsn

    In general, the solution is unique

    1min ,

    N

    xN

    x Rx R

    F

    Digital Media Lab.

    n m ar y,

    n There is always a solution (r = y-Fx) has at most(N-M) non-zero termsn In general, the solution is unique

    22

    1min ,

    N

    xN

    x Rx y R

    F - F

    From CS Theory Lecture Notes by E. Candes, 2007

  • 7/30/2019 Data Compressing

    76/163

    L1 Approach (3)L1 Approach (3)

    n Variantn Start with minimum cardinality problem (C: convex)

    n Apply heuristic to obtain L1-norm minimization problem

    ( )inimize card x subject to x C

    1|| ||inimize x subject to x C

    Digital Media Lab.

    n ariantn Start with cardinality constrained problem (f, C: convex)

    n Apply heuristic to obtain L1-norm constrained problem

    n Or L1-regularized problem

    n b, g are adjusted so that card(x) K.

    23

    ( ) , ( )inimize f x subject to x C card x K

    1( ) , || ||Minimize f x subject to x C x b

    1( ) || ||inimize f x x subject to x C l+

    From Prof. S. Boyd (EE364a, b) Stanford Univ

    L1 Approach (4)L1 Approach (4)

    n Variant with polishingn Use L1 heuristic to find x estimate with required sparsityn Fix the sparsity pattern of xn Re-solve the (convex) optimization problem with this sparsity pattern to

    obtain final (heuristic) solution.

    Digital Media Lab. 24From Prof. S. Boyd (EE364a, b) Stanford Univ

  • 7/30/2019 Data Compressing

    77/163

    Digital Media Lab.

    Some examples: convex optimization

    25

    From Computational Methods for Sparse 2010 IEEE Proceedings by J.A. Tropp and J. Wright

    EqualityEquality--constrainedconstrained ProblemProblem

    n Equality-constrained problemn Among all x consistent with measurements, pick one with min L1 norm

    1min ( 1)

    xx subject to y x C= F

    Digital Media Lab. 26From Computational Methods for Sparse 2010 IEEE

    Proceedings by J.A. Tropp and J. Wright

  • 7/30/2019 Data Compressing

    78/163

    Convex Relaxation MethodConvex Relaxation Method

    n Convex Relaxation Method

    n m is a regularization parameter: it governs the sparsity of the solutionn large m typically produces sparser results.

    2

    2

    1

    1min , 0 ( 2)

    2xx y x Cm m

    F - +

    Digital Media Lab.

    n How to choose m ?n Often need to solve the equation repeatedly for different choices ofm, or to

    trace systematically the path of solutions as m decreases towards zero.

    27From Computational Methods for Sparse 2010 IEEE

    Proceedings by J.A. Tropp and J. Wright

    LASSOLASSO

    n Least Absolute Shrinkage and Selection Operator (LASSO) methodn It is equivalent to the convex relaxation method (C2) in the sense that the

    path of solution of (C3) parameterized by positive b matches the solutionpath for as m varies.

    2

    2

    1

    min ( 3)x

    x y subject to x CbF -

    Digital Media Lab.

    n Rf: its L0 version:

    n Can interpret this as fitting the vector y as a linear combination of Kregressors (chosen form N possible regressors) ~ feature selection (instatistics).

    n i.e., choose a subset of M regressors that (together) best fit or explain y.n

    In can be solved (in principle) by trying all choices.

    n Rf: An independent variable is also known as a "predictor variable", "regressor",

    "controlled variable, "manipulated variable", "explanatory variable", "feature" (see

    machine learning and pattern recognition) or an "input variable.

    28From CS Tutorial at ITA 2008 by Baraniuk, Romberg, and Wakin

    M

    2 0min

    xy subject to x KF -

  • 7/30/2019 Data Compressing

    79/163

    OthersOthers

    n Quadratic relaxation (LASSO)n Explicit parameterization of the error norm

    21

    min ( 4)x

    x subject to x y CeF -

    Digital Media Lab.

    n Danzig selector (with residual correlation constraints)

    29

    ( )1

    minT

    xx subject to x y e

    F F -

    From CS Tutorial at ITA 2008 by Baraniuk, Romberg, and Wakin

    Further study (Volunteer?)Further study (Volunteer?)

    n Other optimization algorithms:n interior point methods (slow, but extremely accurate)

    n homotopy methods (fast and accurate for small-scale problems)

    Digital Media Lab. 30

  • 7/30/2019 Data Compressing

    80/163

    Gradient Method (1)Gradient Method (1)

    n (also known as the first-order method) iteratively solve the followingproblem

    n Similar methods under this categoryn Operator splitting [65]

    2

    2

    1

    1min , 0 ( 2)

    2x

    x y x Cm m

    F - +

    Digital Media Lab.

    n Iterative splitting and thresholding (IST) [66]n Fixed-point iteration [67]n Sparse reconstruction via separable approximation (SpaRSA) [68]n TwIST [70]n GPSR [71]

    31From Computational Methods for Sparse 2010 IEEE

    Proceedings by J.A. Tropp and J. Wright

    Gradient Method (2)Gradient Method (2)

    n Gradient-descent framework

    Input: a signal yRM, sensing matrixRMxN, regularization parameterm> 0, and initial estimate x0 Output: coefficient vector xRN

    Algorithm:(1). Initialize: set k=1

    +

    Digital Media Lab. 32

    .

    If an acceptance test on is not passed, increase ak by somefactor and repeat.(3). Line search: choose gk (0,1] and obtain xk+1 from

    (4). Test: If stopping criterion holds, terminate with x=xk+1. Otherwise,set kk+1 and goto (2).

    k

    ( ) ( )2* *

    2 1

    1: arg min

    2k k k k k

    z

    z x x y z x za m+ = - F F - + - +

    ( )1 :k k k k k x x xg ++ = + -

    k

  • 7/30/2019 Data Compressing

    81/163

    Gradient Method (3)Gradient Method (3)

    n This gradient-based method works well on sparse signals when thedictionary F satisfies RIP.n It benefits from warm starting, that is, the work required to identify a

    solution can be reduced dramatically when the initial estimate of x is close

    to the solution.

    n Continuation strategyn Solve the optimization problem (C2) for a decreasing sequences ofm

    Digital Media Lab.

    using the approximate solution for each value as the starting point for thenext sub-problem.

    33From Computational Methods for Sparse 2010 IEEE

    Procedings by J.A. Tropp and J. Wright

    Review:Review:

    Convex OptimizationConvex Optimization

    Digital Media Lab. 34

  • 7/30/2019 Data Compressing

    82/163

    References (1)References (1)

    n Introduction to Optimizationn http://ocw.mit.edu/courses/electrical-engineering-and-computer-

    science/6-079-introduction-to-convex-optimization-fall-2009/index.htm

    Digital Media Lab. 35

    References (2)References (2)

    n Convex Optimization (EE364a by Prof. Boyd)n http://www.stanford.edu/class/ee364a/lectures.htmln Video lecture is also available Introduction

    Convex sets

    Convex functions

    Convex optimization problems

    DualityApproximation and fitting

    Statistical estimation

    Geometric roblems

    Digital Media Lab. 36

    Numerical linear algebra background

    Unconstrained minimization

    Equality constrained minimization

    Interior-point methods

    Conclusions

    Lecture slides in one file.

    Additional lecture slides:

    Convex optimization examples

    Stochastic programming

    Chance constrained optimization

    Filter design and equalizationDisciplined convex programming and CVX

    Two lectures from EE364b:

    methods for convex-cardinality problems

    methods for convex-cardinality problems, part II

  • 7/30/2019 Data Compressing

    83/163

    Mathematical Optimization ProblemMathematical Optimization Problem

    n Optimize problem

    Digital Media Lab. 37From Prof. S. Boyd (EE364a, b) Stanford Univ

    Solving Optimization ProblemSolving Optimization Problem

    n General optimization problemn Very difficult to solven Methods involve some compromise, e.g., very long computation time, or

    not always finding the solution

    n

    Exceptions : certain problem classes can be solved efficiently andreliablyn Least-square problems

    2min yF -

    Digital Media Lab.

    n Analytical solution

    n Linear programming problemsn No analytical formula for solution

    n Reliable and efficient algorithms and software~ a mature technology

    n Convex optimization problemsn Objective and constraint functions are convex

    n It includes least-squares problems and linear programming as special cases

    38

    x

    ( )1

    * T Ty

    -= F F F

    in 1,.....,T T

    ic x subject to a x yi m =

    0min ( ) ( ) , 1,...,

    i if x subject to f x y i m =

    From Prof. S. Boyd (EE364a, b) Stanford Univ

  • 7/30/2019 Data Compressing

    84/163

    Optimization problem in standard formOptimization problem in standard form

    Digital Media Lab. 39

    Optimal & locally optimal pointsOptimal & locally optimal points

    Digital Media Lab. 40

  • 7/30/2019 Data Compressing

    85/163

    Implicit constrainsImplicit constrains

    Digital Media Lab. 41

    Convex Set and Others (1)Convex Set and Others (1)

    n Def: A set W is convexif and only if for any x1 and x2 W and for any, 0 1, the convex combination x= x1 + (1-) x2 W

    n Example: convex set non-convex set non-convex set

    Digital Media Lab.

    n Def: Convex combination of x1,. . ., xk: any point x of the form x = 1x1+ 2x2 + + kxk with 1 + + k =1, j 0.

    n Def: Convex hull (conv S): a set of all convex combinations of pointsin S

    42From Prof. S. Boyd (EE364a, b) Stanford Univ

  • 7/30/2019 Data Compressing

    86/163

    Convex Set and Others (2)Convex Set and Others (2)

    n Def: Conic(nonnegative) combination of x1 and x1: any point of theform x = 1x1 + 2x2 with 1 0, 2 0.

    Digital Media Lab.

    n Def: Convex cone: a set that contains all conic combinations of pointsin the set

    43From Prof. S. Boyd (EE364a, b) Stanford Univ

    Convex functionConvex function

    n Def: A function f(x): W R is convexif only if any convexcombination x = x1 + (1-) x2 for all x1, x2 W and , 0 1,satisfies f{x1 + (1-) x2} f(x1)+(1

    )f(x2).

    Digital Media Lab.

    n Note that f is concave if (f) is convexn A function f is strictly convex iff f{x1 + (1-) x2} < f(x1)+(1

    )f(x2).

    44From Prof. S. Boyd (EE364a, b) Stanford Univ

  • 7/30/2019 Data Compressing

    87/163

    11ststorder Conditionorder Condition

    n Def: A function f is differentiable ifdom f is open and the gradient

    exists at each x dom f.

    n Def: 1st-order condition: A differentiable function f with convex

    1

    ( ) ,...,n

    f ff x

    x x

    =

    Digital Media Lab.

    domain is convexiff f(y) f(x)+f(x) (yx) for all x,ydom f.

    45From Prof. S. Boyd (EE364a, b) Stanford Univ

    2nd order Condition2nd order Condition

    n Def: A function f is twice differentiable ifdom f is open and theHessian2f(x)Sn exists at each xdom f.

    n Def: 2nd-order conditions: for twice-differentiable function f with

    ( )2

    2( ) 1 ,

    iji j

    ff x for i j N

    x x

    =

    Digital Media Lab.

    convex oma n, a unc on s convex an on y

    n Strict convex: no equality sign

    46

    ( )2 ( ) 0 1 ,ij

    f x for i j N

    From Prof. S. Boyd (EE364a, b) Stanford Univ

  • 7/30/2019 Data Compressing

    88/163

    Data CompressionData Compression

    (ECE 5546(ECE 5546--41)41)

    2012 Fall

    Digital Media Lab.

    Ch 5. Algorithms for Sparse Recovery

    Part 2

    Byeungwoo Jeon

    Digital Media Lab, SKKU, Korea

    http://media.skku.ac.kr; [email protected]

    Recovery Algorithms (1)Recovery Algorithms (1)

    n Category 1: Convex optimization approach (or convex relaxation)n Replace the combinatorial problem with a convex optimization problem.n Solve the convex-optimization problem with algorithms which can exploit

    the problem structure.

    n Category 2: Greedy algorithmsn Greedy pursuits

    n Iteratively refine a sparse solution by successively identifying one or more

    Digital Media Lab.

    components that yield the greatest improvements in quality.

    n In general very fast and are applicable to very large datasets, however,

    theoretical peformance guarantees are typically weaker than those of some

    other methods.

    n Thresholding algorithmsn The methods alternate both element selection as well as element pruning

    steps. These methods are often very easy to implement and can be relatively

    fast.

    n These have theoretical performance guarantees that rival those guarantees

    derived for convex optimization-based approaches.

    2

  • 7/30/2019 Data Compressing

    89/163

    Recovery Algorithms (2)Recovery Algorithms (2)

    n Category 3: Bayesian frameworkn Assume a prior distribution for the unknown coefficients favoring sparsity.n Develop a maximum a posterior estimator incorporating the observation.n Identify a region of significant posterior mass or average over most-

    probable models.

    n Category 4: Other approachesn Non-convex optimization method: relax the L0 problem to a related non-

    Digital Media Lab.

    convex problem and attempt to identify a stationary point.n Brute force method: search through all possible support sets, possibly

    using cutting-plane methods to reduce the number of possibilities.n Heuristic method: based on belief-propagation and message-passing

    techniques developed in graphical models and coding theory.

    3

    Digital Media Lab.

    Greedy Algorithms

    A greedy algorithm is an algorithm that follows the problem solving heuristic of

    making the locally optimal choice at each stage with the hope of finding a global

    optimum. In many problems, a greedy strategy does not in general produce anoptimal solution, but nonetheless a greedy heuristic may yield locally optimal

    solutions that approximate a global optimal solution in a reasonable time.

    4http://en.wikipedia.org/wiki/Greedy_algorithm

  • 7/30/2019 Data Compressing

    90/163

    Greedy Algorithm (1)Greedy Algorithm (1)

    n Starting at A, a greedy algorithm (GA) will find the local maximum at"m", instead of the global maximum at "M".

    Search global

    maximum startin

    Digital Media Lab. 5http://en.wikipedia.org/wiki/Greedy_algorithm

    from A?

    Greedy Algorithm (2)Greedy Algorithm (2)

    n The greedy algorithm determines theminimum number of coins to give whilemaking change. These are the steps ahuman would take to emulate a greedyalgorithm to represent 36 cents usingonly coins with values {1, 5, 10, 20}.

    n The coin of the highest value, less

    Ex: How to pay 36 cents

    using only coins with

    values {1, 5, 10, 20}?

    Digital Media Lab.

    than the remaining change owed, isthe local optimum.n (Note that in general the change-

    making problem requires dynamic

    programming or integer programming

    to find an optimal solution.)

    n However, most currency systems,

    including the Euro (pictured) and US

    Dollar, are special cases where the

    greedy strategy does find an optimum

    solution

    6http://en.wikipedia.org/wiki/Greedy_algorithm

  • 7/30/2019 Data Compressing

    91/163

    Sparse Signal Recovery viaSparse Signal Recovery via Greedy AlgorithmGreedy Algorithm

    { }| ( )z z y noise free case F = -=

    0 argmin ( )

    z

    z subject to z B y=

    x

    x

    n Problem:Fory=Fx and forxassumed to be sparse (or compressible), findsatisfying

    where B(y) ensures that is consistent with the measurements y.

    Digital Media Lab. 7

    2| ( )z z y noisy caseeF -

    n This problem can be re-written as,

    n W denotes a particular subset of the indices i=1,,N, and fidenotes theith column ofF.

    n Use greedy algorithm to find the index set W.

    min : i ii

    I y x fW

    W

    =

    RfRf: Greedy Algorithms: Greedy Algorithms

    n Greedy algorithms have been called in different terms in other fieldsn Statistics: Forward stepwise regression

    n Nonlinear approximation: Pure greedy algorithm

    n Signal Processing: Matching pursuit

    n Radio Astronomy: CLEAN algorithm

    Digital Media Lab. 8

  • 7/30/2019 Data Compressing

    92/163

  • 7/30/2019 Data Compressing

    93/163

    Basic idea ofBasic idea of PursuitPursuit algorithm (3)algorithm (3)

    n Pre-normalization: assume that all the columns ofF are normalizedby multiplying by normalizing matrix W

    ( ) ,normalizing M N N N W where R W R F F F

    1 2 2

    1 1.....

    N

    where W diagf f

    =

    Digital Media Lab.

    n Under thispre-normalization, the solution of pursuit algorithm can beeasily found by identifying the column maximizing

    n As a final step, the solution vector x should be post-normalized by

    n Theorem tells that the normalization does not change the solution.n From now on, assume pre-normalization without loss of generality.

    11

    ( )2

    T

    jyf

    ( )normalizing

    Wx

    Basic idea ofBasic idea of PursuitPursuit algorithm (4)algorithm (4)

    n Suppose K > 1: since y is a linear combination ofKcolumns ofF, theproblem is to find a subset ofF consisting of K columns.n Need to enumerate combinations

    n Greedy algorithm (Pursuit-based methods): instead of the exhaustivesearch, select column one by one in favor of local optimum.n Start from x(0) = 0 (residual r(0) = y), it iteratively constructs K-term

    ( )~ KN

    O NK

    Digital Media Lab.

    , ,expanding the set by one additional column.

    n The additional column at each stage is the one which maximally reducesthe residual error (in L2 sense) in approximating the measurement y usingthe currently active columns.n Residual: as-yet unexplained portion of the measurement

    n After constructing an approximation including a new column, a newresidual vector is computed by subtracting the approximation represented

    by the newly selected column from the current residual.n A new residual L2 error is evaluated: if it falls below a threshold, the

    algorithm terminates. Otherwise, looks for another column.

    12

  • 7/30/2019 Data Compressing

    94/163

    Various Pursuit AlgorithmsVarious Pursuit Algorithms

    n Matching Pursuit (MP) ~ also known as pure greedy algorithm

    n Orthogonal Matching Pursuit (OMP)

    n Weak-Matching Pursuit

    n These algorithms all belong to Greedy algorithms (GA)n Its variants include

    Digital Media Lab.

    n Pure GA(PGA)

    n Orthogonal GA(OGA)

    n Relaxed GA(RGA)

    n Weak GA (WGA)

    n Rf: At this point, it is not fully clear what role greedy pursuit algorithmswill ultimately play in practice.From Computational Methods for Sparse 2010 IEEEProceedings by J.A. Tropp and J. Wright

    13

    Digital Media Lab.

    Category 2: Greedy algorithms

    n Greedy pursuits

    n Thresholding algorithms

    14

  • 7/30/2019 Data Compressing

    95/163

    Matching Pursuit (MP) (1)Matching Pursuit (MP) (1)

    n First proposed by Mallat and Zhang*: iterative greedy algorithm thatdecomposes a signal into a linear combination of elements from adictionary (i.e., sensing matrix).

    0

    ( ) ( )

    Inputs: Sensing matrix , measurement vector , error threhold

    Outputs: a sparse signal

    Initialize: Set 0, index set , and residual .

    Main Iteration: Increment by 1 and perform followings: