journal of la low rank regularization: a review · journal of latex class files, vol. 14, no. 8,...

Low Rank Regularization: A review

Zhanxuan Hua,c, Feiping Niea,c, Rong Wang*b,c, Xuelong Lia,c

aSchool of Computer Science, Northwestern Polytechnical University, Xi’an, 710072,Shaanxi, P.R. China.

bSchool of Cybersecurity, Northwestern Polytechnical University, Xi’an, 710072, Shaanxi,P.R. China.

cCenter for OPTical IMagery Analysis and Learning (OPTIMAL), NorthwesternPolytechnical University, Xi’an, 710072, Shaanxi, P.R. China.

Abstract

Low Rank Regularization (LRR), in essence, involves introducing a low rankor approximately low rank assumption to target we aim to learn, which hasachieved great success in many data analysis tasks. Over the last decade,much progress has been made in theories and applications. Nevertheless,the intersection between these two lines is rare. In order to construct abridge between practical applications and theoretical studies, in this paperwe provide a comprehensive survey for LRR. Specifically, we first reviewthe recent advances in two issues that all LRR models are faced with: (1)rank-norm relaxation, which seeks to find a relaxation to replace the rankminimization problem; (2) model optimization, which seeks to use an efficientoptimization algorithm to solve the relaxed LRR models. For the first issue, weprovide a detailed summarization for various relaxation functions and concludethat the non-convex relaxations can alleviate the punishment bias problemcompared with the convex relaxations. For the second issue, we summary therepresentative optimization algorithms used in previous studies, and analysistheir advantages and disadvantages. As the main goal of this paper is topromote the application of non-convex relaxations, we conduct extensiveexperiments to compare different relaxation functions. The experimentalresults demonstrate that the non-convex relaxations generally provide a largeadvantage over the convex relaxations. Such a result is inspiring for furtherimproving the performance of existing LRR models.

∗Corresponding author: Rong Wang ([email protected])

Preprint submitted to Neural Networks December 11, 2020

arX

iv:1

808.

0452

1v3

[cs

.CV

] 1

0 D

ec 2

020

Keywords:low rank, regularization, optimization

1. Introduction

In many fields the data we analysis are generally a set of vectors, matricesor tensors. Some examples include voices in signal processing, documentsin natural language processing, users’ records in recommender systems, im-ages/videos in computer vision, and DNA microarrays in bioinformatics.Such data are in general high dimensional, which brings a series of challengesto subsequent data analysis tasks. Fortunately, the high-dimensional datagenerally have some specific structural characteristics, e.g., sparsity. Com-pressed sensing and sparse representation are two powerful tools to analysisthe order-one signals data with sparsity, both of them naturally fit for vectorsand achieve tremendous success in several applications. However, we areoften faced with matrices or tensors data, such as images, users’ records andgenetic microarrays. Then we are naturally faced with a problem: how toleverage the sparsity of matrices or tensors? Low rank is a powerful tool tothis issue, which is a metric to second order (i.e., matrix) sparsity [1].

A typical example is recommendation system, where we have an uncom-pleted rating matrix and aim to leverage the known rates of users on someitems to infer their ratings on others. Such a problem refers to matrix com-pletion in machine learning. Mathematically, we provide a simple example inFigure 1, where r = (r1, r2, r3) is a column vector (or row vector) of ratingmatrix R. The location of r in 3D space can be exactly confirmed when thevalues of (r1, r2, r3) are known. Nevertheless, the only thing we can determineis that r locates in the line l when r3 is absent. Fortunately, the location of 3Dvector r (i.e., the value of r3) is inferable when it locates in a 2D subspace S.Hence, a natural assumption for matrix completion task is that all columnsor rows locate in a common low-dimensional subspace. That is the ratingmatrix is intrinsically low rank. Such an assumption is natural and reasonable,because users ratings for some items generally influenced by ratings for otherkinds of items. Without the Low Rank Regularization (LRR), it becomesimpossible to infer new users ratings on items.

Over the last decade, LRR sparked a large research interest from variousmachine learning models including matrix completion. And such modelshave achieved great success in many fields such as computer vision and data

2

x

y

zl

1 2 3, ,r r rr

S

Figure 1: An illustration to matrix completion using low rank regularition.

mining. Specifically, recent advances in LRR can be roughly divided into twogroups: theoretical studies and practical applications.

Theoretical studies. The main lines of research are rank relaxation andoptimization algorithms. Optimizing the LRR models inevitably involvessolving a rank minimization problem, which is known to be NP-hard. Analternative is to relax the rank-norm 1 using nuclear norm. Such that therelaxed problem is convex and can be readily solved by existing convexoptimization tools. Nevertheless, nuclear norm treats all singular valuesequally, leading to a bias to the matrix with small singular values. Towardsthis issue, arguably one of the most popular methods is using non-convexrelaxations such as TNN (Truncated Nuclear Norm) [2], WNN (WeightedNuclear Norm) [3, 4, 5] and Schatten-p norm [6, 7, 8, 9, 10]. Although non-convex relaxations are often helpful in reducing the relaxation gap, they bringa new challenging problem to algorithm designers. This is because traditionaloptimization algorithms are tailored to convex problems and inapplicablein non-convex problems. To this end, in recent years, a class of non-convexoptimization algorithms have been developed, such as IRW (Iteratively Re-weighted method) [11, 12, 13], ADMM (Alternating Direction Method ofMultipliers) [14], APG (Accelerated Proximal Gradient) method [15, 16, 17,18], and Frank-Wolfe Algorithm [19].

1Note that rank-norm is not valid norm. Here, ”rank-norm” is for convenience.

3

Practical applications. LRR is a popular tool in a wide range ofdata mining and computer vision tasks. Examples include, but not limitedto, recommendation systems [20, 21], Visual Tracking [22], 3D Reconstruc-tion [23, 24, 25], Salient Object Detection [26, 27, 28, 29]. The models used inthese tasks are generally derived from some basic machine learning models in-cluding Matrix Completion [30, 31], Subspace Clustering [32] and Multi-TaskLearning [33, 34].

In order to promote the interaction between Theoretical studies andPractical applications, some excellent reviews and surveys over LRR [35, 36,37, 1] have been made. However, most of them specialized in single group,e.g., low-rank matrix learning and its applications in image analysis [35, 37, 1],optimization algorithms used in low rank matrix learning [36]. Besides, thesestudies often revolve around the nuclear norm regularization and ignore thenon-convex relaxations. But the superiority of non-convex regularizers overnuclear norm has been verified in many theoretical studies [6, 7, 8, 9, 10, 2]. Tothis end, in this paper we provide a new review for LRR and pay more attentionto the non-convex relaxations and corresponding optimization methods. Themain contributions of this paper are summarized as follows:

• We first provide an exhaustive analysis for commonly used relaxations,including convex and non-convex relaxations. Then we summarize somerepresentative optimization algorithms for solving the relaxed low rankmodels. Both of them are absent in previous reviews and surveys.

• We conduct a great many of experiments to compare the performanceof different relaxations. The experimental results demonstrate that thenon-convex relaxations generally provide a large advantage over theconvex relaxations. Such a result is useful for promoting the applicationof non-convex relaxations in solving practical issues.

It is worth emphasizing that this work mainly focuses on the LRR modelsrather than low rank matrix learning, and the former can be considered as asubproblem of the latter. For low rank matrix learning, we recommend thebook [37]. The rest of this paper is organized as follows. We first review theTheoretical studies over LRR in Sect. 2. Then, in Sect 3 we summarize themachine learning models using LRR and their applications in solving practicalissues. To further analysis the relaxation functions, we conduct numerousexperiments and report the results in Sect 4. Finally, we conclude this paperin Sect. 5 and discuss the future work over LRR.

4

Table 1: Vevtor/matrix Norms

Type norm Defination

Vector norm`0-norm ‖x‖0 =

∑di=1 |xi|0

`1-norm ‖x‖1 =∑d

i=1 |xi|`2-norm ‖x‖2 = (

∑di=1 |xi|2)

12

Matrix norm

Rank-norm ‖X‖r =∑k

i=1 σi(X)0

Nuclear-norm ‖X‖∗ =∑k

i=1 σi(X)

f -rank-norm ‖X‖f∗ =∑k

i=1 f(σi(X))

Frobenius-norm ‖X‖F = (Tr(XXT ))12

`2,1-norm ‖X‖2,1 =∑n

i=1 ‖xi‖2

`1-norm ‖X‖1 =∑

i,j |Xij|

Notations. In this work, the vectors and matrices are denoted by low-ercase boldface letters x and uppercase boldface letters X respectively. xi,xi and Xij denote the ith row, jth column, and (i, j)th element of matrixX. Tr(X) denotes the trace of matrix X. We summarize the vectors/matrixnorms in Table 1, where σi(X) is the ith singular value of matrix X, andfunction f(x) is a function over σi(X).

2. Theoretical studies over Low Rank Regulaization

2.1. Problem formulation

Over the last decade, LRR models have attracted much attention dueto its success in various fields. All these models assume that the targetsto be learnt lie near single or multiple low-dimensional subspaces, and uselow-rankness as a regularizer to build the model. Here, we provide a formalformulation:

minXL(X) + λ‖X‖r s.t. X ∈ C , (1)

where L(X) represents the loss term that depends on the tasks we dealwith, ‖X‖r (the rank of X) represents the regularization term, while λ is aparameter balancing these two terms. Besides, C represents the constraintsover X. We are often faced with two issues in solving the LRR models. First,relaxation to LRR. Matrix rank minimization problem is known to beNP-hard, hence we need to find an alternative to replace the rank-norm ‖X‖r.

5

Second, optimization algorithm to LRR, the relaxed models is often non-convex/non-smooth, and we need to find an efficient optimization algorithmto solve the relaxed models. For these two lines of research are generallysuitable for most existing LRR models, we group them into Theoretical studies.Next, we review the recent advances in each of them.

2.2. Relaxations to LRR

Given a matrix X and its ordered singular values (σ1(X), σ2(X), . . . , σk(X)).The matrix rank-norm is equivalent to ‖σ‖0, where σ is a vector, and σi =σi(X). Similar to `0-norm minimization, matrix rank minimization is knownto be NP-hard. An alternative is using a relaxation to replace rank-normregularization. We denote the relaxed regularizer by R(X) =

∑ki=1 f(σi(X)),

where f(x) represents a relaxation function. According to the property off(x), we divide commonly used regularizers into two groups: convex relax-ations and non-convex relaxations. A summarization for these relaxationscan be found in Table 2.

2.2.1. Convex relaxations

Nuclear norm corresponds to the function f(x) = x. The connectionbetween it and rank-norm is introduced by [50], where the author report asignificant Theorem.

Theorem 1. The convex envelope of the function φ(X) = ‖X‖r, on constraintC = X ∈ Rm×n|‖X‖ ≤ 1, is φenv(X) = ‖X‖∗. Here, ‖X‖ = σ1(X) denotesthe spectral norm, i.e., the largest singular value of matrix X.

This Theorem demonstrates that we can obtain a lower bound on theoptimal solution of the rank minimization problem via solving a heuristicproblem (nuclear norm relaxation).

Although nuclear norm is convex and can be readily solved by existingconvex optimization tools, it often leads to a bias to the matrix with smallsingular values. Recently, a particular convex regularizer was developed in [51],which aims to find a convex approximation for ‖X‖r + λ‖X−X0‖2

F ratherthan the rank-norm. Hence, this regularizer will be ignored in the rest of thispaper. In addition, a Elastic-Net Regularization of Singular Values (ERSV)has been proposed in [38], which corresponds to function f(x) = x+ γx2.

6

Table 2: Commonly used relaxations in LRR. γ or p refers to the parameter used inrelaxation function, and λ represents the regularization parameter.

Name λR(X)

Nuclear norm∑k

i=1 λσiElastic-Net [38]

∑ki=1 λ(σi + γσ2

i )

Sp-norm [6, 7]∑k

i=1 λσpi

TNN [2]∑k

i=r+1 λσiPSN [39]

∑ki=r+1 λσi

WNN [3, 4, 5]∑k

i=r+1 λwiσiCNN [40, 31]

∑ki=r+1 λmin(σi, θ)

Capped Sp [31]∑k

i=r+1 λmin(σpi , θ)(0 < p < 1)

γ-Nuclear norm [41]∑k

i=1λ(1+γ)σiγ+σi

LNN [42]∑k

i=1 λlog(σi + 1)

Logarithm [43, 44]∑k

i=1λlog(γσi+1)log(γ+1)

ETP [45, 44]∑k

i=1λ(1−exp(−γσi))

1−exp(−γ)

Geman [46, 44]∑k

i=1λσiσi+γ

Laplace [47, 44]∑k

i=1 λ(1− exp(−σiγ ))

MCP [48, 44]∑k

i=1

λσi − σi

2γ , if σi < γλ,γλ2

2 , if σi ≥ γλ.

SCAD [49, 44]∑k

i=1

λx, σi ≤ λ−σ2

i +2γλσi−λ22(γ−1) , λ < σi ≤ λγ

λ2(γ+1)2 , σi > λγ

7

2.2.2. Non-convex relaxations

Although nuclear norm has achieved success in several practical appli-cations, it suffers a well-documented shortcoming that all singular valuesare simultaneously minimized. However, in real data larger singular valuesgenerally quantify the main information we want to preserve. To this end, inrecent years, a class of non-convex relaxation approaches has permeated thefields of machine learning.

The advantages of non-convex relaxations over nuclear norm are firstshown in [6, 7] for dealing with the matrix completion problems. In particular,both of them generalize the nuclear norm to Schatten-p norm 2. Consideringthat larger singular values should not be punished, Hu [2] propose a TruncatedNuclear Norm (TNN) which punishes only the n− r smallest singular values.A similar regularizer, namely Partial Sum Nuclear Norm (PSNN), is developedin [39]. Indeed, both TNN and PSNN can be considered as special cases ofCapped Nuclear Norm (CNN) used in [40]. To alleviate rather than abandonthe punishment on larger singular values, Gu et al. [3] propose a weightednuclear norm:

‖X‖w∗ =∑k

i=1 wiσi(X) , (2)

where w = [w1, w2, . . . , wk] is weight vector, and wi ≥ 0 is a weight valueassigned to σi(X). Obviously, the punishment bias between larger valuesand small values can be alleviated by assigning small weights to the formerand larger weights to the latter. In addition, TNN, PSNN and CNN can beconsidered as the special cases of WNN with weight vector:

w = [0, 0, . . . , 0︸︷︷︸r

, 1, . . . , 1︸︷︷︸k−r

] .

In addition to the relaxations mentioned above, numerous non-convexrelaxations derived from sparse learning have been proposed, such as γ-nuclear norm [41], Log Nuclear Norm (LNN) [42], ETP [45], Logarithm [43],Geman [46], Laplace [47], MCP [48], and so on. The details of using them totackle the matrix completion problems can be found in [44, 52, 17, 53].

Matrix factorization is another method for low-rank regularization, whichrepresents the expected low rank matrix X with rank r as X = UVT , whereU ∈ Rm×r and V ∈ Rn×r. Moreover, the equation (3) [54] has been used to

2Actually, Schatten-p norm refers to the Schatten-p quasi norm when 0 < p < 1.

8

0 0.5 1 1.5 2 2.5 3b

0

0.5

1

1.5

2

2.5

3

Shri

nked

b

(a) Log

0 0.5 1 1.5 2 2.5 3b

0

0.5

1

1.5

2

2.5

3

Shri

nked

b(b) ETP

0 0.5 1 1.5 2 2.5 3b

0

0.5

1

1.5

2

2.5

3

Shri

nked

b

(c) Laplace

0 0.5 1 1.5 2 2.5 3b

0

0.5

1

1.5

2

2.5

3

Shri

nked

b

(d) Sp with p = 0.1

0 0.5 1 1.5 2 2.5 3b

0

0.5

1

1.5

2

2.5

3

Shri

nked

b

(e) Sp with p = 0.5

0 0.5 1 1.5 2 2.5 3b

0

0.5

1

1.5

2

2.5

3

Shri

nked

b(f) nuclear norm

Figure 2: Comparisons between b and ρf (b, λ) with λ = 1, where f represents the relaxationfunction that we select to relax the rank-norm. For ETP, we set γ = 1. For Laplace, weset γ = 0.5.

solve the matrix completion problem.

‖X‖∗ = minA,B:ABT =X

1

2(‖A‖2

F + ‖B‖2F ), (3)

where A ∈ Rm×d, B ∈ Rn×d and d ≥ r. Recently, Shang et al. [8] developtwo variants of Eq. (3), namely Double Nuclear norm penalty (‖X‖D−N ) andFrobenius/nuclear hybrid norm penalty (‖X‖F−N). Both of them focus onthe connection between Schatten-p norm and matrix factorization.

Closed-form solutions to relaxations. An interesting property ofmost existing relaxations mentioned above is that the closed-form solutionscan be obtained directly in a specific minimization problem.

minX

1

2‖X−M‖2

F + λR(X) . (4)

For nuclear norm, i.e., R(X) = ‖X‖∗, the close-form solution of problem (4)can be obtained directly via a Singular Value Thresholding (SVT) operator

9

ρ(x, λ) [55].X = Uρ(S, λ)VT , (5)

where USVT = M is the SVD of M, and

ρ(S, λ)ii = max(Sii − λ, 0) . (6)

The Eq. (6) shows that nuclear norm treats all singular values equally andshrinks them with the same threshold λ. Such a result is consistent withthe analysis mentioned above. Besides, Gu et al. [3] propose a weightedSVT (WSVT) operator ρw(x, λ) to weighted nuclear norm, i.e., R(X) =‖X‖w∗. Correspondingly, we have:

ρw(S, λ)ii = max(Sii − wiλ, 0) . (7)

It alleviates the punishment bias between larger singular values and smallsingular values via assigning small weights to the former and larger weightsto the latter. Recently, Lu et al. [53] generalize SVT to a more general case:

minX

1

2‖X−M‖2

F + λk∑i=1

f(σi(X)) . (8)

where f(x) can be anyone continuous function satisfying the condition: Func-tion f(x) is concave, nondecreasing and differentiable on [o,+∞). Besides,f(0) = 0 and ∇f is convex. Some representatives including Schatten-p norm,MCP [48], Geman [46] and Laplace penalties [47] are reported in Table 2.Further, they provide a Generalized Singular Value Thresholding (GSVT)operator ρg(x, λ). Specifically,

ρf (x, λ) = argminx≥0

fb(x) = λf(x) +1

2(x− b)2 , (9)

where b denotes a singular value of M. A general solver for finding the optimalsolution of the problem (9) has been provided in [53] 3. To demonstrate thedifference between different relaxations including nuclear norm, we reportthe shrinkage results returned by them in Figure 2, where the differencebetween b and ρf(b, λ) represents the shrink. Figure 2 shows that when b

3We provide the codes in: https://github.com/sudalvxin/2018-GSVT.git

10

takes a small value the shrinkage effect of different relaxations are similar.Nevertheless, when b takes a large value the difference between non-convexrelaxations and nuclear norm are significant. Furthermore, the shrink ofnon-convex relaxations on larger values are very small, which is contrastwith the nuclear norm that takes serious shrinks on larger singular values.Hence, using non-convex relaxations can preserve the main information ofM. In addition, one can find that non-convex relaxations prefer to generate0 singular values, i.e., the low rank solution when regularization parameter λtakes the same value.

2.3. Optimization algorithms to LRR

Solving a problem with low rank regularization has drawn significantattention, and a great many of specialized optimization approaches have beenproposed. In this section we review some popular optimization methods thatare suitable for both convex relaxations and non-convex relaxations. Note thatsome traditional optimization methods to LRR models are omitted due totheir limitations in solving practical issues. Examples include Singular ValueProjection (SVP) algorithm [56] and fixed-point continuation algorithm [57].

2.3.1. Frank-Wolfe (FW) Algorithm

The FW algorithm [58, 59] is one of the earliest first-order approaches forsolving the problems of the form:

minx∈C

f(x), (10)

where x can be a vector or matrix, f(x) is Lipschitz-smooth and convex. FWis an iterative method, and at t+ 1 iteration, it updates x by

mt = arg minm∈C

mT∆f(xt), (11)

γt = arg minγ∈[0,1]

f((1− γ)xt + γmt) , (12)

xt+1 = (1− γt)xt + γtmt , (13)

where Eq. (11) is a tractable subproblem. The convergence rate of FWalgorithm is O(1/T ) [59]. The details of using FW to solve the LRR modelswith nuclear norm can be found in [59]. In addition, an improved methodfor matrix completion problem can be found in [60]. Recently, Yao et al. [52]generalize FW to tackle with non-convex relaxations via redistributing non-convexity of regularization term to loss.

11

2.3.2. Proximal Gradient (PG) algorithm

Proximal Gradient (PG) algorithm is commonly used method for solvingthe unconstraint optimization problem of the form:

minxf(x) + g(x) , (14)

where f(x) is convex and L-Lipschitz smooth, and g(x) is convex. In t + 1iteration, PG updates x by:

xt+1 = arg minxf(xt) + (x− xt)T∆f(xt) +

L

2‖x− xt‖2

2 + g(x)

= arg minx

1

2‖x− xt +

1

L∆f(xt)‖2

2 + g(x) .

(15)

The convergence rate of PG is also O(1/T ). But, it can be accelerated toO(1/T 2) by incorporating Nesterov’s technique [61]. Ye et al. [62] use APGto solve the nuclear norm minimization problem. The case that f(x) org(x) is non-convex has been studied in [15, 16, 17, 18]. In [15], the authorsdevelope two APG-type algorithms, named monotone APG (mAPG) and non-monotone APG (nmAPG), respectively. Both of them replace the descentcondition used in [63] by a sufficient descent condition. In addition, anInexact Proximal Gradient algorithm (IPG) was developed in [18] to reducethe computation cost caused by two proximal mappings, and a fast proximalalgorithm was developed in [17] to reduce the computation cost caused byconducting SVT over a large scale matrix. Note that most of existing PG-typealgorithms are constructed for solving the problem that is unconstrained (overdesired matrix) and has only one variable to optimize 4. Hence, although ithas sound theoretical guarantee in terms of convergence, it is rarely used insolving the practical issues where various constraints must be considered andmultiple variables must be optimized simultaneously. Conversely, generalizingAPG methods to tackle with complicated models is a promising direction.

2.3.3. Iteratively Re-Weighted (IRW) Algorithm

The Iteratively Re-Weighted algorithm (IRW) is primitively designed forsolving a class of sparse learning problems of the form (11) [11, 12, 13]. It is

4In [17], the authors extend it to cope with the problem involved two separable parameterblocks.

12

intrinsically derived from the Majorization-Minimization (MM) algorithm [12].Specifically, at t+ 1-th iteration, IRW updates x by:

xt+1 = arg minxf(x, xt) + g(x) , (16)

where f(x, xt) is a convex envelope of f(x) at point xt. The essence of MMis iteratively solving a series of subproblems that are amenable to existingfirst-order methods. Iterative Reweighted Least Squares algorithms (IRLS) [6]is a seminal work using IRW to handle the rank minimization problem, anda similar algorithm was developed in [7]. Both of them focus on solving thematrix completion problems with Schatten-p norm (0 < p ≤ 1). Recently,Kummerle et al [64] provide a Harmonic Mean Iteratively Re-Weighted (HM-IRW) for such models, which exhibits a locally superlinear rate of convergenceunder some specific conditions. Nie et al. [65] study a more general casethat the relaxation function f(x) used in R(X) is any differentiable concavefunction. In addition, an Iteratively Reweighted Nuclear Norm (IRNN)method has been proposed in [44], which can be considered as a combinationof PG and IRW. Similar to PG-type method, IRW is constructed for dealingwith the problems with single variable. So, its feasibility in practical issues isalso limited.

2.3.4. Alternating Direction Method of Multipliers

Alternating Direction Method of Multipliers (ADMM) [66] is a popularlyused method for solving constraint optimization problem of the form:

minx,y

f(x) + g(y)

s.t. Ax+ By = c ,(17)

where f(x) and g(x) are convex. Suppose the augmented Lagrangian functionof Eq. (17) is:

L(x, y, u) = f(x) + g(y) + uT (Ax+ By − c) +µ

2‖Ax+ By − c‖2

2 , (18)

where u denotes the Lagrangian multipliers, while µ denotes the penaltyparameter. At t+ 1-th iteration, ADMM update x and other variables by:

xt+1 = arg minxL(x, yt, ut), (19)

yt+1 = arg minyL(xt, y, ut), (20)

ut+1 = ut + µ(Ax+ By − c). (21)

13

A comprehensive survey and an useful tool for ADMM can be found in [14],where a unified optimization framework is provided. Theoretically, ADMMconverges to a critical point for convex problems. However, its convergenceis still an open issue when optimization problem is non-convex. Besides, apractical variant of ADMM, namely Relaxed ADMM, was discussed in [67, 68].

2.3.5. Discussion and summarization

We have described four representative algorithms for solving the LRRmodels. A discussion over these algorithms is as follows:

• All algorithms can deal with the LRR models with non-convex relax-ations. And the core is solving a subproblem iteratively.

• FW and PG-type algorithms can deal with the problems with singlevariable or multiple separable variables. IRW is parameter-free, butit is suitable for the problems with single variable. For LRR modelswith multiple variables, IRW can ensure the objective function valueof original problem being monotonously decreasing when we use thealternating minimization method to update the variables in subproblem.

• Compared with FW, PG-type and IRW, ADMM is more suitable for com-plicated problems with equality constraints and multiple non-separablevariables. The reason is that in ADMM the low rank regularization overany one variable can be transferred to an auxiliary variable, and thecorresponding subproblems can be solved directly using SVT or GSVT.However, there are two shortcomings in ADMM: parameter selection (µ)and low convergence speed.

• The time consuming of IRW is high on large scale data, because itneeds to compute the full SVD of X or the eigenvalue decomposition ofXXT (XTX) [65] in subproblem. For ADMM and PG-type algorithms,the subproblem often has a form of Eq. (8). It is obvious that inEq. (8) the ith singular value of X is zero when σi(M) is smaller thana specific threshold. That is, only several leading singular values ofM are needed in SVT and GSVT [17]. Such that the computationalcomplexity of SVT and GSVT can be reduced from O(mn2) to O(mnr)by using PROPACK [69], where r represents the number of leadingsingular values. In addition, an useful tool for further reducing the timeconsuming is using approximate SVT or GSVT [17, 70].

14

3. Applications of Low Rank Regularization

Over the last decade, LRR sparked a large research interest from variousmachine learning models. According to the type of target to be learnt, weroughly group these models into two categories: (1) data learning models, thetarget to be learnt is a data matrix, such as Robust Principal ComponentAnalysis and Robust Matrix Completion; (2) coefficient learning models,the target to be learnt is a coefficient (weight) matrix, such as SubspaceClustering and Multi-Task Learning. In this section, we first review thesefour representative machine learning models and then show their applicationsin practical issues.

3.1. Applications of LRR in machine learning

3.1.1. Robust Principal Component Analysis (RPCA)

In most cases, the fundamental assumption for using LRR is that thedata we collected lies near some low-dimensional subspaces. For instance,users’ records (such as ratings for movies) in recommender systems, andimages in computer vision. In the real world, however, the data are generallycorrupted by noise and outliers. RPCA [71] is one of the most significanttools for recovering a low-rank matrix robustly from noisy observations.Mathematically, RPCA assumes that the data matrix M is the sum of a lowrank matrix X and a noise matrix E and use the following LRR models torecover X.

minX,E

‖E‖1 + λ‖X‖r s.t. M = X + E , (22)

where ‖E‖1 is suitable for the sparse noise and can be replaced by othermatrix norms.

3.1.2. Robust Matrix Completion (RMC)

RMC [2] is one of the most important variants of RPCA, which considersa general case that some entries of the input data matrix M are missing,and the known entries are corrupted by noise. The goal of RMC is utilizingthe known information to estimate the values of missing entries. The basicassumption used by RMC is that the complete matrix X we aim to recover islow rank or approximately low rank. Correspondingly, the problem of RMCcan be solved by using a model of the form:

minX‖PΩ(X−M)‖1 + λ‖X‖r , (23)

15

where PΩ represents a projecting operator, Ω represents a set recording theindices of known entries. The entries of matrix PΩ(X) are consistent with Xon Ω and are 0 on residuals.

3.1.3. Multi-Task Learning (MTL)

Both RPCA and RMC are based on an assumption that the data matrixwe are faced with is low rank or approximately low rank. That is, the datawe collected is relevant. The relatedness among different samples furtherinspires researchers to explore the relatedness among different tasks. GivenK relevant tasks TiKi=1 accompanied by feature matrix Xi ∈ Rni×dKi=1

and target vectors yi ∈ Rni×1Ki=1, we can learn them simultaneously toimprove the generalization performance of each one. Such a problem refers toMulti-Task Learning (MTL). A general LRR model for MTL is:

minW

K∑i=1

‖Xiwi − yi‖22 + λ‖W‖r , (24)

where W ∈ Rd×K is a weight matrix, and its ith column wi is the weightvector to task Ti. The relatedness among K tasks imply that the structure ofW is low rank or approximately low rank [72].

3.1.4. Subspace Clustering (SC)

Given a set of data points approximately drawn from a union of multiplesubspaces, the goal of SC is to partition the data points into their respectivesubspaces. To this end, Liu et al. [32] propose a low rank representationmodel, which seeks a low rank matrix W that consists of the candidates ofall data points in a given dictionary D. A general morel for SC using LRR is:

minW‖DW −X‖2,1 + λ‖W‖r , (25)

where the learnt low rank matrix W can be seen as a rough similarity matrix,and the final partitioning result can be obtained by conducting spectral

clustering with a refined similarity matrix Z = |W|+|WT |2

. A significant variantof Eq. (25) is LatLRR [73], which jointly learns the low rank affine matrixW and the low rank basis matrix L using a model of the form:

minW,L‖W‖r + ‖L‖r + λ‖X−XW − LX‖1 . (26)

16

LatLRR can be considered as a combination of low-rank representation andInductive Robust Principal Component Analysis [74]. Such a model and itsvariants [75, 76, 77] have been widely used to deal with computer vision taskssuch as image recovery and denoising.

3.1.5. Summarization

For each task mentioned above, there are a great many of algorithms havebeen developed. For instance, RPCA [40, 78, 4], MC [30, 31, 79, 80, 44, 81, 82],MTL [72, 33, 34, 83, 62, 84, 65, 85] and SC [86, 87, 32, 88, 89, 90, 91].Nevertheless, the main differences between these algorithms are loss functionor regularization term. A short discussion for loss functions widely used inmachine learning can be found in [92]. The details over the regularizationterm can be found in Sect. 2.2.

Furthermore, it is worth noting that in addition to the models mentionedabove, LRR has achieved success on other machine learning tasks. Examplesinclude Component Analysis [93, 94], Compressive Sensing [95], Multi-ViewLearning [96, 97, 98, 99, 100, 101], Transfer Learning [102, 103], SpectralClustering [104], Dictionary Learning [105], Metric Learning [106], FeatureExtraction [91, 107] and so on. Indeed, most of them, including LRR andMTL, are derived from RPCA and MC. Hence, the recent progress achievedin RPCA and MC may be useful for further improving the performance ofexisting algorithms.

3.2. Applications of LRR in practical tasks

A basic assumption in machine learn models mentioned above is that thecolumns or rows of target matrix are relevant. Such an assumption is oftensatisfied in real data. We provide a summarization for practical issues solvedby using LRR models in Table 3. Note that the models used in solving theseissues are partly derived from the machine learning models mentioned above.Next, we describe the details of some representatives among these tasks.

3.2.1. Video background subtraction

Video background subtraction [71] aims to recover the background com-ponent of a video data. Given a video with n gray images of size wh, wevectorize each image as a vector x ∈ Rwh. Stacking all vectors, we can obtaina matrix X ∈ Rwh×n. The problem of video background subtraction can besolved by solving a RPCA model of the form:

minL,E

‖E‖1 + λ‖L‖r , s.t. X = L + E , (27)

17

Table 3: Practical issues solved by using Low Rank Regularization. More details withrespect to constructed models and corresponding optimization methods can be found inreferences.

Application Regularizer R(X) Optimization

Face Analysis [108, 109, 110, 93] ‖X‖∗ ADMM

Person Re-Identification [111] ‖X‖∗ others

Visual Tracking [22, 112] ‖X‖∗ others

3D Reconstruction [23, 24, 25] ‖X‖∗ ADMM

Image denoising [4, 113, 114, 115]∑k

i=1wiσi ADMM

Structure Recovery [116]∑r

i=1wiσi others

Video Desnowing and Deraining [117] ‖X‖∗ AM

Salient Object Detection [26, 27, 28, 29] ‖X‖∗ ADMM

Face Recognition [118, 119, 120] ‖X‖∗ ADMM

High Dynamic Range Imaging [121, 78] ‖X‖∗,∑r

i=k+1 σi ADMM

Head Pose Estimation [122] ‖X‖∗ ADMM

Moving Object Detection [123] ‖X‖∗ ADMM

Reflection Removal [124] ‖X‖∗ others

Zero-Shot Learning [125]∑k

i=r+1 σi others

Speckle removal [126]∑k

i=r+1wiσi ADMM

Image Completion [127]∑k

i=r+1wiσi ADMM

Image Matching [128] 12(‖U‖2F + ‖V‖2F ) ADMM

Video Segmentation [129] ‖UV‖2F others

Image alignment [130, 131, 132, 133] ‖X|∗ ADMM

Image Restoration [134] ‖X‖∗ ADMM

Image Classification [135, 136, 137, 138, 139, 140] ‖X‖∗ ADMM

AAM fitting [141] ‖X‖∗ ADMM

Image Segmentation [142] ‖X‖∗ ADMM

Motion Segmentation [143] ‖X‖∗ ADMM

Colorization [144, 145] ‖X‖∗ ADMM

photometric stereo [146] ‖X‖∗ ADMM

Textures [147] ‖X‖∗ ADMM

Behavior Analysis [148] ‖X‖Sp ADMM

Heart Rate Estimation [149] ‖X‖∗ ADMM

Text Models [150] ‖X‖∗ others

Rank Aggregation [151] ‖X‖∗ ADMM

Deep Learning [152, 153, 154] ‖X‖∗ SGD

18

where L is a low rank matrix, recording the background components ofvideo, while E is a sparse matrix, recording the foreground components ofvideo. In order to achieve better splitting results, numerous improvementsfor Eq. (27) have been made. In [4], the nuclear norm is replaced by non-convex regularizers to achieve better low rank approximation. In orderto cope with the nonrigid motion and dynamic background, a DECOLORmethod (DEtecting Contiguous Outliers in the LOw-rank Representation)is proposed in [155]. Recently, Shakeri et al. [123] construct a low-rank andinvariant sparse decomposition model to reduce the effect caused by variousillumination changes. For background subtraction is one of the indices toevaluate the performance of RPCA algorithms, it has received a great manyof attentions, and a comprehensive survey can be found in [156] 5.

3.2.2. Recommender System

The goal of recommender system is leveraging some prior informationon user behavior to predict user preferences [157]. The Collaborative Filter-ing (CF), as exemplified by low rank matrix learning methods, have provensuccessful in dealing with this task. In order to predicate the preferencesof users, one would like to recover the missing values of a rating matrix M.Such a task can be solved by using a matrix completion model of the form:

minX‖PΩ(X−M)‖2

F + λ‖X‖r , (28)

where X denotes the complete rating matrix. This task is a significantmetric for evaluating the performance of matrix completion models and hasattached much attention in recent years. The main lines of research are rankminimization [18, 17, 21] and matrix factorization [20, 158]. Particularly, thelatter often assumes that the rank r of target matrix is known and replaceX ∈ Rn×m with two new variables U ∈ Rm×r and V ∈ Rr×n, where X = UV.An overview for solving such a classes of non-convex optimization problemshas been provide in [159].

3.2.3. Image denoising

Image denoising, in essence, involves estimating the latent clean imagefrom an noisy image, which is a fundamental problem in low level vision.The success of several state-of-the-art image denoising algorithms [160, 3] is

5BS library: https://github.com/andrewssobral/lrslibrary

19

based on the exploitation of image nonlocal self-similarity, which refers to theassumption that for each local patch in a natural image, one can find somesimilar patches to it. Given an noise image, Gu et al. [3, 161] propose tovectorize all similar patches as column vectors and stacking them as a noisymatrix M. Further, the clean patches can be recovered by using a RPCAmodel of the form:

minX‖X−M‖2

F + λ‖X‖r , (29)

where X denotes the clean patches we aim to recover. Similar models withdifferent regularizers have been proposed in [114], and a different model withsimilar regularizer has been developed in [162, 163].

3.2.4. Image alignment

Image Alignment (IA) involves transforming various images into a commoncoordinate system. Given a set of images, Peng et al. [132] vectorize eachimage as a vector m and stack all vectors as a matrix M. The alignmentimages X can be recovered by using a model of the form:

minX,E,τ ‖E‖1 + λ‖X‖r s.t. M τ = X + E , (30)

where τ denotes a transformation, E denotes the noise or the differencesamong images. There are two variants for this method [141, 130], but both ofthem select the nuclear norm to serve as regularizer and the ADMM methodto optimize the corresponding models.

3.2.5. Non-rigid structure from motion

Non-rigid Structure From Motion (NSFM) is a powerful tool for recoveringthe 3D structure and pose of an object from a single image, which is well knownto be an ill-posed problem in literature due to the non-rigidity. Assumingthat the nonrigid 3D shapes lie in a single low dimensional subspace, Dai etal. [24] propose to incorporate low rank prior into NSFM. Specifically, themodel for inferring the 3D coordinates is:

minX,E

‖X‖r + λ‖E‖1 s.t. W = RX# + E , (31)

where W consists of the 2D projected coordinates of all data points, Xdenotes the 3D coordinates we aim to recover. In addition, the definitions ofR and X# can be found in [24]. Obviously, such a formulation is derived from

20

RPCA. Subsequently, in order to cope with the complex nonrigid motion that3D shapes lie in a union of multiple subspaces rather than a single subspace,Zhu et al. [25] propose a subspace clustering based model. This model learnsa 3D structure matrix X and an affinity matrix Z, simultaneously. And Z isused to obtain the final clustering result. Besides, a more complicated casethat the 2D point tracks contain multiple deforming objects is studied in[164, 165, 23].

3.2.6. Deep learning with LRR

Deep learning is a powerful tool to tackle various tasks, which leverage DeepNeural Networks (DNN) to integrate features learning and weights learninginto a unified framework. Recently, numerous studies [152, 153, 166, 167]propose to introduce the low rank regularization into deep learning. Similarly,all these methods can be grouped into two categories including low rankfeatures learning [168, 153] and low rank weights learning [167]. To learnlarge-margin deep features for classification problem, Lezama et al. [153]propose an Orthogonal Low-rank Embedding (OLE) method, which enforcesthe intra-class embeddings falling in a subspace aligned with the correspondingweight vector, while embeddings of different classes be orthogonal to eachother. The OLE loss is:

minX

=C∑c=1

max(∆, ‖Xc‖r)− ‖X‖r , (32)

where X denotes the deep embeddings of data, while Xc is a sub-matrixof X, which consists of the embeddings from the class c. Besides, ∆ is aparameter to avoid the collapse of embeddings to zero. By combining OLEloss and stand cross-entropy loss, the author achieve a better result in imageclassification task. Subsequently, Zhu et al. [169] generalize OLE loss to serveas a regularization term to improve the generalization ability of DNN.

Low rank weights learning refers to train a DNN whose weights are lowrank or approximately low rank, which plays on significant role in the taskof compressing DNN model [170]. In order to compress the size of trainednetwork, Xu et al. [166] integrates low rank approximation (over weights) intothe procedure of training deep neural networks. Such a strategy can guaranteethat the trained network has a low-rank structure in nature, and eliminatethe error caused by fine-tuning. Besides, Piao et al. [167] leverage LRR todeal with the deep subspace clustering problem, where learnt low rank weight

21

matrix is consistent with the affine matrix of traditional subspace clustering.Recently, Sanyal et al. [171] propose to use Stable Rank Rormalization (SRN)to improve the generalization performance of DNN.

3.3. Summarization

In addition to the tasks mentioned above, LRR have been applied into othertasks, including Visual Tracking [22, 112, 172], salient object detection [26, 27,28, 29], face analysis [108, 110, 93] and so on. A comprehensive summarizationfor these tasks are reported in Table 3. Indeed, most of these algorithms arederived from the machine learning models mentioned above. But, observingTable 3, one can find that in these tasks nuclear norm often serves as therelaxation function, while ADMM generally serves as the optimization method.And researchers do not pay enough attention to the recent progress made innon-convex relaxation and optimization. Such an issue is also the motivationof this review. We prefer to promote the application of non-convex relaxationsin practical issues. Next, we experimentally verify the advantage of non-convexrelaxations over nuclear norm.

4. Comparison between different relaxations to LRR

For the main goal of this paper is promoting the application of non-convexrelaxations in solving practical issues, in this section we would like to take acomprehensive investigation for several non-convex relaxations.

A huge number of LRR models have been developed for addressing variousdata analysis tasks in the past decade. However, the loss function and regu-larization term are the main differences between these models. Particularly,the selection of loss function depends on the problem we aim to solve. Takinga comprehensive comparison for all tasks beyond the scope of our ability. Inthis work, we conduct our investigation on image denoising task. As presentedin [4], the core in this task is solving a LRR model as follows:

minX‖X−M‖2

F + λk∑i=1

f(σi(X)) . (33)

The reasons of using such a task include:

1. As discussed above, the problem (33) is a basic problem that we cannotavoid when dealing with complicated LRR models;

22

Figure 3: The 8 test images used in this paper.

2. The model (33) has only one parameter needed to be tuned. Such thatthe results can be analysed conveniently.

The relaxations to rank-norm include: Weighted Nuclear Norm (WNN), LogNuclear Norm (LNN), Truncated Nuclear Norm (TNN), and Schatten-p normwith p = 1, p = 0.5, = 0.01, respectively. Particularly, Nuclear Norm (NN)is equivalent to the Schatten-p norm with p = 1. For each regularizer wecan obtain the solution of problem (33) directly via SVT or GSVT. Here, weabandon the Capped Nuclear Norm, ETP, Laplace, Geman and so on, for allof them have additional parameters needed to be tuned.

4.1. Experimental setting

We select 8 widely used images with size 256 × 256 to evaluate thecompeting regularizers. The thumbnails are shown in Fig 3. Similarly, wecorrupt the original image by Gaussian noise with distribution N (0, σ2).

We use the codes provided by Gu et al. [4] 6. For each patch, in [4] theauthors run K iterations of this approximation process to enhance the qualityof denoising. In this paper we fix K = 1 and other parameters as authors

6https://sites.google.com/site/shuhanggu/home

23

0

10

1

PSN

R

2

20

3 4 5 R66 R

5R47

RegularizerR38 R

29 R1R

0

(a) τ = 0.01

0

10

1

PSN

R

2

20

3 4 5 R66 R

5R47

RegularizerR38 R

29 R1R

0

(b) τ = 0.1

0

10

1

PSN

R

2

20

3 4 5 R66 R

5R47

RegularizerR38 R

29 R1R

0

(c) τ = 1

Figure 4: A comprehensive comparison between different regularizers on image Peppers.R0 refers to the PSNR of input noisy image. R1, R2, . . . , R6 refer to the regularizersWNN, LNN, TNN, Sp with p = 0.1, Sp with p = 0.5, and NN (Nuclear Norm, Sp withp = 1), respectively. Note that all relaxations, except for TNN, will shrink all singularvalues to 0 when λ takes a large value.

0

10

1

PSN

R

2

20

3 4 5 R66 R

5R47

RegularizerR38 R

29 R1R

0

(a) τ = 0.01

0

10

1

PSN

R

2

20

3 4 5 R66 R

5R47

RegularizerR38 R

29 R1R

0

(b) τ = 0.1

0

10

1

PSN

R

2

20

3 4 5 R66 R

5R47

RegularizerR38 R

29 R1R

0

(c) τ = 1

Figure 5: A comprehensive comparison between different regularizers on image Lena. R0

refers to the PSNR of input noisy image.

suggested to avoid introducing additional parameters. More implementationdetails over experiments can be found in [4].

For all regularizers except for WNN we set λ = τη√npσ

2, where np is thenumber of similar parts, σ represents the noise level, τ controls the large scalerange of varying λ, and η controls the small scale range of varying λ. ForWNN, we set λw = C ∗ λ, where C =

√2 as authors suggested. For TNN, we

fix r = 5 for all test images.Comparison 1. In this test, we fix σ = 50 and vary τ in the set

Sτ = 0.1, 1, 10, η in the set Sη = 1, 2, . . . , 9. We select only Lena andPeppers two images to test. The PSNR results under different parameters forall competing relaxations are shown in Figure 4 and Figure 5, where R0 refers

24

to the PSNR of input noisy image, R1, R2, . . . , R6 refer to the relaxationsWNN, LNN, TNN, Sp with p = 0.1, Sp with p = 0.5, and nuclear norm,respectively. According to the definition of SVT and GSVT, we can find thatall relaxations will shrink all singular values to zero when λ takes a largevalue (Under this case, we set PSNR = 1 for visualization). Observing theFigure 4 and Figure 5, we can achieve the follow conclusions.

• Non-convex relaxations consistently outperforms nuclear norm (convexrelaxation) in terms of the best result;

• We should select a smaller regularization parameter λ for nuclear norm,for it prefers to shrink all singular values to zero when λ takes a largevalue (see the cases that τ > 1 and η > 5). Nevertheless, the shrinkfor singular values may be insufficient when λ takes small value (seethe case that τ = 0.01 and 1 ≤ η ≤ 4). Such a contradiction limits theperformance of nuclear norm;

• TNN performs well even when λ takes a large value, for it preventsthe r largest singular values from being shrinked. Nevertheless, twolimitations of it cannot be ignored. First, the value of r must beestimated. Second, the r largest singular values generally carry noiseinformation we want to remove, and preserving them may degeneratethe performance of algorithm;

• We should select a larger regularization parameter λ for non-convexrelaxations. When λ takes a small value, the shrink on all singularvalues is vary slight. Hence, the PSNRs of them are very close to originalimages. Non-convex relaxations can reduce the shrink on larger singularvalues and enhance the shrink on smaller singular values simultaneouslycompared with nuclear norm.

In this test, we ignore the numerical differences (PSNR values) betweendifferent non-convex relaxations, because such a difference can be compressedby carefully selecting regularization parameter for each regularizer.

Comparison 2. In this test, for each regularizer we fix λ as the valueachieving the best performance in above test. More specifically, we fix τη = 5for WNN, τη = 2 for LNN, τη = 0.03 for TNN and nuclear norm, τη = 8for Sp with p = 0.1, τη = 0.5 for Sp with p = 0.5. Besides, we follow [4]and conduct WNN iteratively. The experimental results are used to serve as

25

Table 4: Experimental results (PSNR,dB) on the 8 test images. Here, WNN-I refers tothe algorithm proposed in [4], which iteratively conduct the reconstruction process on allpatches. The best results (without considering WNN-I) are highlighted in bold.

Noise Images WNN-I WNN LNN TNN (r = 1) Sp(p = 0.1) Sp(p = 0.5) Nuclear Norm

σ = 10

Airplane 34.732 34.352 34.130 30.146 34.495 31.527 30.194Boat 33.727 33.186 33.309 30.023 33.512 31.090 30.051

Couple 33.735 33.247 33.240 30.019 33.551 31.006 30.036Fingerprint 31.069 30.907 30.696 29.420 30.999 29.783 29.422

Hill 33.715 33.300 33.386 30.075 33.629 31.101 30.101Lena 35.721 35.308 35.026 30.282 35.496 31.724 30.318Man 33.436 32.998 33.069 29.998 33.329 30.963 30.022

Peppers 35.965 35.593 35.149 30.292 35.736 31.714 30.323

σ = 30

Airplane 28.749 28.077 28.200 26.357 28.242 27.176 26.470Boat 27.875 27.144 27.413 26.013 27.273 26.588 26.045


Hill 28.472 27.759 28.013 26.436 27.868 27.025 26.492Lena 29.929 29.060 29.167 26.970 29.186 27.803 27.094Man 27.897 27.337 27.535 26.089 27.471 26.589 26.119

Peppers 29.986 29.239 29.202 26.992 29.356 27.732 27.069

σ = 50

Airplane 26.193 25.407 25.408 24.811 25.471 25.3815 24.463Boat 25.537 24.789 24.871 24.472 24.825 24.802 23.980


Hill 26.349 25.572 25.615 25.432 25.587 25.570 24.962Lena 27.476 26.622 26.519 26.243 26.699 26.526 25.811Man 25.710 25.075 25.098 24.782 25.128 25.053 24.316

Peppers 27.235 26.411 26.236 26.052 26.518 26.291 25.524

σ = 70

Airplane 24.550 23.684 23.481 23.068 23.967 23.772 22.464Boat 24.144 23.228 23.120 22.945 23.406 23.180 22.221


Hill 25.023 24.054 23.898 24.056 24.315 24.106 23.169Lena 26.011 24.699 24.420 24.573 25.122 24.929 23.626Man 24.366 23.459 23.299 23.331 23.712 23.499 22.457

Peppers 25.360 24.277 23.942 24.155 24.729 24.584 23.132

σ = 100

Airplane 23.086 22.015 22.193 21.723 22.318 21.764 21.022Boat 22.737 21.690 21.842 21.623 21.853 21.446 21.024


Hill 23.597 22.406 22.605 22.583 22.609 22.051 21.795Lena 24.531 22.660 22.881 22.749 23.051 22.430 21.650Man 23.016 21.832 22.017 21.981 22.067 21.531 21.051

Peppers 23.648 22.258 22.447 22.364 22.708 22.085 21.144

26

reference. The noise level is controlled by varying the parameter σ in the setSσ = 10, 30, 50, 70, 100. The PSNR results are reported in Table 4. Theinformation delivered by Table 4 can be summarized as follows:

• WNN-I achieves the best result in all cases due to conducting the recon-struction process iteratively. But, it provides only a small advantageover others that conduct only one iteration, especially the best onehighlighted in bold.

• Non-convex relaxations outperform nuclear norm in most cases. Par-ticularly, the difference between NN and TNN is very small when σtakes a small value. But, residual non-convex relaxations provide alarge advantage over both TNN and NN.

• The regularizer Sp with p = 0.1 achieves the best results in most cases.The reason is that the gap between Sp with p = 0.1 and nuclear normis very small. A natural idea is: using the regularizer Sp with p < 0.1can generate a better result. In practice, however, we do not supportto use such a regularizer, because it treats all singular values almostequally. For instance, 100.01 ≈ 1.0233 and 10000.01 ≈ 1.0715.

5. Conclusion and Discussion

In this paper, we provided a comprehensive survey for low rank regulariza-tion including relaxations to rank-norm, optimization methods and practicalapplications. Differing from previous investigations that focus on nuclearnorm, we pay more attention to the non-convex relaxations and correspondingoptimization methods. Although the nuclear norm with solid theoreticalguarantee has been widely used to solve various problems, the solution re-turned by it may deviate from the original problem significantly. Indeed, sucha deviation can be alleviated by using non-convex relaxations. In order topromote the application of non-convex relaxations in solving practical issues,we give a detailed summarization for non-convex relaxations used in previousstudies. Although the theoretical research over non-convex relaxation is stilllimited, numerous studies have experimentally verified that it often provide alarge advantage over nuclear norm.

An inevitable thing in solving LRR models (without considering matrixfactorization) is conducting SVD, which is time consuming for large scaledata. So, introducing the techniques over approximate SVD is an important

27

direction in the future [17]. Besides, more recent efforts have been made intensor learning [173, 174, 175, 176, 177, 178, 179, 180, 181, 182]. But most ofthem are based on the convex relaxations. The problem of tensor completionbased on non-convex regularization is considered in [183, 184]. Generalizingtensor learning with non-convex relaxations to deal with practical issues isalso promising in future.

ACKNOWLEDGMENTS

This work was supported in part by the National Key Research andDevelopment Program of China under Grant 2018YFB1403501, in part by theNational Natural Science Foundation of China under Grant 61936014, Grant61772427 and Grant 61751202, and in part by the Fundamental ResearchFunds for the Central Universities under Grant G2019KY0501.

References

[1] Z. Lin, A review on low-rank models in data analysis, Big Data &Information Analytics 1 (2/3) (2017) 139–161.

[2] Y. Hu, D. Zhang, J. Ye, X. Li, X. He, Fast and accurate matrix com-pletion via truncated nuclear norm regularization, IEEE Transactionson Pattern Analysis and Machine Intelligence 35 (9) (2013) 2117–2130.

[3] S. Gu, L. Zhang, W. Zuo, X. Feng, Weighted nuclear norm minimizationwith application to image denoising, in: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 2014, pp.2862–2869.

[4] S. Gu, et al, Weighted nuclear norm minimization and its applicationsto low level vision, International Journal of Computer Vision 121 (2)(2017) 183–208.

[5] X. Zhong, L. Xu, Y. Li, Z. Liu, E. Chen, A nonconvex relaxationapproach for rank minimization problems., in: Proceedings of AAAIConference on Artificial Intelligence, 2015, pp. 1980–1987.

[6] K. Mohan, M. Fazel, Iterative reweighted algorithms for matrix rankminimization, Journal of Machine Learning Research 13 (2012) 3441–3473.

28

[7] F. Nie, H. Huang, C. H. Ding, Low-rank matrix recovery via efficientschatten p-norm minimization., in: Proceedings of AAAI Conferenceon Artificial Intelligence, 2012.

[8] F. Shang, J. Cheng, Y. Liu, Z.-Q. Luo, Z. Lin, Bilinear factor matrixnorm minimization for robust pca: Algorithms and applications, IEEEtransactions on pattern analysis and machine intelligence 40 (9) (2017)2066–2080.

[9] F. Shang, Y. Liu, J. Cheng, Scalable algorithms for tractable schattenquasi-norm minimization., in: Proceedings of AAAI Conference onArtificial Intelligence, 2016, pp. 2016–2022.

[10] C. Lu, Z. Lin, S. Yan, Smoothed low rank and sparse matrix recovery byiteratively reweighted least squares minimization, IEEE Transactionson Image Processing 24 (2) (2015) 646–654.

[11] E. J. Candes, M. B. Wakin, S. P. Boyd, Enhancing sparsity by reweighted`1 minimization, Journal of Fourier Analysis and Applications 14 (5-6)(2008) 877–905.

[12] P. Ochs, A. Dosovitskiy, T. Brox, T. Pock, On iteratively reweightedalgorithms for nonsmooth nonconvex optimization in computer vision,SIAM Journal on Imaging Sciences 8 (1) (2015) 331–372.

[13] I. Daubechies, R. DeVore, M. Fornasier, C. S. Gunturk, Iterativelyreweighted least squares minimization for sparse recovery, Communi-cations on Pure and Applied Mathematics: A Journal Issued by theCourant Institute of Mathematical Sciences 63 (1) (2010) 1–38.

[14] C. Lu, J. Feng, S. Yan, Z. Lin, A unified alternating direction methodof multipliers by majorization minimization, IEEE Transactions onPattern Analysis and Machine Intelligence 40 (3) (2018) 527–541.

[15] H. Li, Z. Lin, Accelerated proximal gradient methods for nonconvexprogramming, in: Proceedings of the Advances in Neural InformationProcessing Systems, 2015, pp. 379–387.

[16] S. Ghadimi, G. Lan, Accelerated gradient methods for nonconvex nonlin-ear and stochastic programming, Mathematical Programming 156 (1-2)(2016) 59–99.

29

[17] Q. Yao, J. T. Kwok, T. Wang, T.-Y. Liu, Large-scale low-rank matrixlearning with nonconvex regularizers, IEEE Transactions on PatternAnalysis and Machine Intelligence 41 (11) (2018) 2628–2643.

[18] Q. Yao, J. T. Kwok, F. Gao, W. Chen, T.-Y. Liu, Efficient inexactproximal gradient algorithm for nonconvex problems (2017) 3308–3314.

[19] X. Zhang, et al, Accelerated training for matrix-norm regularization: Aboosting approach, in: Proceedings of the Advances in Neural Informa-tion Processing Systems, 2012, pp. 2906–2914.

[20] Y. Xu, L. Zhu, Z. Cheng, J. Li, J. Sun, Multi-feature discrete collabora-tive filtering for fast cold-start recommendation, in: Proceedings of theAAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 270–278.

[21] J. Li, K. Lu, Z. Huang, H. T. Shen, On both cold-start and long-tailrecommendation with social data, IEEE Transactions on Knowledgeand Data Engineering.

[22] Y. Sui, Y. Tang, L. Zhang, G. Wang, Visual tracking via subspacelearning: A discriminative approach, International Journal of ComputerVision 126 (5) (2018) 515–536.

[23] A. Agudo, M. Pijoan, F. Moreno-Noguer, Image collection pop-up:3d reconstruction and clustering of rigid and non-rigid categories, in:Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2018, pp. 2607–2615.

[24] Y. Dai, H. Li, M. He, A simple prior-free method for non-rigid structure-from-motion factorization, International Journal of Computer Vision107 (2) (2014) 101–122.

[25] Y. Zhu, D. Huang, F. De La Torre, S. Lucey, Complex non-rigid motion3d reconstruction by union of subspaces, in: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 2014, pp.1542–1549.

[26] H. Peng, et al, Salient object detection via structured matrix decompo-sition, IEEE Transactions on Pattern Analysis and Machine Intelligence39 (4) (2017) 818–832.

30

[27] X. Shen, Y. Wu, A unified approach to salient object detection vialow rank matrix recovery, in: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2012, pp. 853–860.

[28] W. Zou, K. Kpalma, Z. Liu, J. Ronsin, Segmentation driven low-rankmatrix recovery for saliency detection, in: Proceedings of the BritishMachine Vision Conference, 2013.

[29] Z. Gao, L.-F. Cheong, Y.-X. Wang, Block-sparse rpca for salient mo-tion detection, IEEE Transactions on Pattern Analysis and MachineIntelligence 36 (10) (2014) 1975–1987.

[30] E. J. Candes, T. Tao, The power of convex relaxation: near-optimalmatrix completion, IEEE Transactions on Information Theory 56 (5)(2010) 2053–2080.

[31] F. Nie, Z. Huo, H. Huang, Joint capped norms minimization for robustmatrix recovery, in: Proceedings of International Joint Conference onArtificial Intelligence, 2017, pp. 2557–2563.

[32] G. Liu, et al, Robust recovery of subspace structures by low-rankrepresentation, IEEE Transactions on Pattern Analysis and MachineIntelligence 35 (1) (2013) 171–184.

[33] L. Han, Y. Zhang, Multi-stage multi-task learning with reduced rank,in: Proceedings of AAAI Conference on Artificial Intelligence, 2016, pp.1638–1644.

[34] X. Zhen, M. Yu, X. He, S. Li, Multi-target regression via robust low-rank learning, IEEE Transactions on Pattern Analysis and MachineIntelligence 40 (2) (2018) 497–504.

[35] X. Zhou, C. Yang, H. Zhao, W. Yu, Low-rank modeling and its applica-tions in image analysis, ACM Computing Surveys (CSUR) 47 (2) (2014)1–33.

[36] S. Ma, N. S. Aybat, Efficient optimization algorithms for robust principalcomponent analysis and its variants, Proceedings of the IEEE 106 (8)(2018) 1411–1426.

31

[37] Z. Lin, H. Zhang, Low-rank Models in Visual Analysis: Theories,Algorithms, and Applications, Academic Press, 2017.

[38] E. Kim, M. Lee, S. Oh, Elastic-net regularization of singular values forrobust subspace learning, in: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2015, pp. 915–923.

[39] T.-H. Oh, H. Kim, Y.-W. Tai, J.-C. Bazin, I. So Kweon, Partial summinimization of singular values in rpca for low-level vision, in: Proceed-ings of the IEEE International Conference on Computer Vision, 2013,pp. 145–152.

[40] Q. Sun, S. Xiang, J. Ye, Robust principal component analysis via cappednorms, in: Proceedings of the ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining, ACM, 2013, pp. 311–319.

[41] Z. Kang, C. Peng, Q. Cheng, Robust pca via nonconvex rank approxi-mation, in: Proceedings of the IEEE International Conference on DataMining, IEEE, 2015, pp. 211–220.

[42] C. Peng, Z. Kang, H. Li, Q. Cheng, Subspace clustering using log-determinant rank approximation, in: Proceedings of the ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining,ACM, 2015, pp. 925–934.

[43] J. H. Friedman, Fast sparse regression and classification, InternationalJournal of Forecasting 28 (3) (2012) 722–738.

[44] C. Lu, J. Tang, S. Yan, Z. Lin, Generalized nonconvex nonsmoothlow-rank minimization, in: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2014, pp. 4130–4137.

[45] C. Gao, N. Wang, Q. Yu, Z. Zhang, A feasible nonconvex relaxationapproach to feature selection., in: Proceedings of AAAI Conference onArtificial Intelligence, 2011, pp. 356–361.

[46] D. Geman, C. Yang, Nonlinear image recovery with half-quadraticregularization, IEEE Transactions on Image Processing 4 (7) (1995)932–946.

32

[47] J. Trzasko, A. Manduca, Highly undersampled magnetic resonanceimage reconstruction via homotopic `0-minimization, IEEE Transactionson Medical imaging 28 (1) (2009) 106–121.

[48] C.-H. Zhang, et al., Nearly unbiased variable selection under minimaxconcave penalty, The Annals of Statistics 38 (2) (2010) 894–942.

[49] J. Fan, R. Li, Variable selection via nonconcave penalized likelihood andits oracle properties, Journal of the American Statistical Association96 (456) (2001) 1348–1360.

[50] M. Fazel, H. Hindi, S. P. Boyd, A rank minimization heuristic withapplication to minimum order system approximation, in: Proceedingsof the American Control Conference, Vol. 6, IEEE, 2001, pp. 4734–4739.

[51] V. Larsson, C. Olsson, Convex low rank approximation, InternationalJournal of Computer Vision 120 (2) (2016) 194–214.

[52] Q. Yao, J. T. Kwok, Efficient learning with a family of nonconvexregularizers by redistributing nonconvexity, Journal of Machine LearningResearch 18 (2017) 179:1–179:52.

[53] C. Lu, C. Zhu, et al, Generalized singular value thresholding., in:Proceedings of AAAI Conference on Artificial Intelligence, 2015, pp.1805–1811.

[54] R. Cabral, et al, Unifying nuclear norm and bilinear factorizationapproaches for low-rank matrix decomposition, in: Proceedings of theIEEE International Conference on Computer Vision, 2013, pp. 2488–2495.

[55] J.-F. Cai, E. J. Candes, Z. Shen, A singular value thresholding algorithmfor matrix completion, SIAM Journal on Optimization 20 (4) (2010)1956–1982.

[56] P. Jain, P. Kar, et al., Non-convex optimization for machine learning,Foundations and Trends® in Machine Learning 10 (3-4) (2017) 142–363.

[57] D. Goldfarb, S. Ma, Convergence of fixed-point continuation algorithmsfor matrix rank minimization, Foundations of Computational Mathe-matics 11 (2) (2011) 183–210.

33

[58] M. Frank, P. Wolfe, et al., An algorithm for quadratic programming,Naval research logistics quarterly 3 (1-2) (1956) 95–110.

[59] M. Jaggi, Revisiting frank-wolfe: Projection-free sparse convex opti-mization, in: Proceedings of the International Conference on MachineLearning, 2013, pp. 427–435.

[60] R. M. Freund, P. Grigas, R. Mazumder, An extended frank–wolfemethod with “in-face” directions, and its application to low-rank matrixcompletion, SIAM Journal on optimization 27 (1) (2017) 319–346.

[61] Y. E. Nesterov, A method for solving the convex programming problemwith convergence rate O(1/k2), in: Dokl. Akad. Nauk SSSR, Vol. 269,1983, pp. 543–547.

[62] S. Ji, J. Ye, An accelerated gradient method for trace norm minimization,in: Proceedings of the International Conference on Machine Learning,2009, pp. 457–464.

[63] A. Beck, M. Teboulle, Fast gradient-based algorithms for constrainedtotal variation image denoising and deblurring problems, IEEE Trans-actions on Image Processing 18 (11) (2009) 2419–2434.

[64] C. Kummerle, J. Sigl, Harmonic mean iteratively reweighted leastsquares for low-rank matrix recovery, The Journal of Machine LearningResearch 19 (1) (2018) 1815–1863.

[65] F. Nie, Z. Hu, X. Li, Calibrated multi-task learning, in: Proceedings ofthe ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, ACM, 2018, pp. 2012–2021.

[66] S. Boyd, N. Parikh, E. Chu, Distributed optimization and statisti-cal learning via the alternating direction method of multipliers, NowPublishers Inc, 2011.

[67] E. X. Fang, B. He, H. Liu, X. Yuan, Generalized alternating direc-tion method of multipliers: new theoretical insights and applications,Mathematical Programming Computation 7 (2) (2015) 149–187.

[68] Z. Xu, M. A. Figueiredo, X. Yuan, C. Studer, T. Goldstein, Adaptiverelaxed admm: Convergence theory and practical implementation, in:

34

Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2017, pp. 7234–7243.

[69] R. M. Larsen, Lanczos bidiagonalization with partial reorthogonaliza-tion, DAIMI Report Series 27 (537).

[70] T.-H. Oh, Y. Matsushita, Y.-W. Tai, I. S. Kweon, Fast randomized sin-gular value thresholding for low-rank optimization, IEEE Transactionson Pattern Analysis and Machine Intelligence 40 (2) (2018) 376–391.

[71] E. J. Candes, X. Li, Y. Ma, J. Wright, Robust principal componentanalysis?, Journal of the ACM (JACM) 58 (3) (2011) 1–37.

[72] T. K. Pong, P. Tseng, S. Ji, J. Ye, Trace norm regularization: Re-formulations, algorithms, and multi-task learning, SIAM Journal onOptimization 20 (6) (2010) 3465–3489.

[73] G. Liu, S. Yan, Latent low-rank representation for subspace segmenta-tion and feature extraction, in: Proceedings of International Conferenceon Computer Vision, IEEE, 2011, pp. 1615–1622.

[74] B.-K. Bao, G. Liu, C. Xu, S. Yan, Inductive robust principal componentanalysis, IEEE Transactions on Image Processing 21 (8) (2012) 3794–3800.

[75] Z. Zhang, S. Yan, M. Zhao, Similarity preserving low-rank representationfor enhanced data representation and effective subspace learning, NeuralNetworks 53 (2014) 81–94.

[76] Z. Zhang, L. Wang, S. Li, Y. Wang, Z. Zhang, Z. Zha, M. Wang,Adaptive structure-constrained robust latent low-rank coding for imagerecovery, in: Proceedings of the IEEE International Conference on DataMining, IEEE, 2019, pp. 846–855.

[77] Z. Zhang, J. Ren, S. Li, R. Hong, Z. Zha, M. Wang, Robust subspacediscovery by block-diagonal adaptive locality-constrained representation,in: Proceedings of the ACM International Conference on Multimedia,2019, pp. 1569–1577.

[78] T. H. Oh, et al, Partial sum minimization of singular values in ro-bust PCA: algorithm and applications, IEEE Transactions on PatternAnalysis and Machine Intelligence 38 (4) (2016) 744–758.

35

[79] P. V. Giampouras, A. A. Rontogiannis, K. D. Koutroumbas, Alternatingiteratively reweighted least squares minimization for low-rank matrixfactorization, IEEE Transactions on Signal Processing 67 (2) (2018)490–503.

[80] Z. Lin, M. Chen, Y. Ma, The augmented lagrange multiplier method forexact recovery of corrupted low-rank matrices, CoRR abs/1009.5055.

[81] E. J. Candes, B. Recht, Exact matrix completion via convex optimiza-tion, Foundations of Computational mathematics 9 (6) (2009) 717.

[82] Y. Chen, H. Xu, C. Caramanis, S. Sanghavi, Matrix completion withcolumn manipulation: Near-optimal sample-robustness-rank tradeoffs,IEEE Transactions on Information Theory 62 (1) (2016) 503–526.

[83] J. Chen, J. Zhou, J. Ye, Integrating low-rank and group-sparse structuresfor robust multi-task learning, in: Proceedings of the InternationalConference on Knowledge Discovery and Data Mining, 2011, pp. 42–50.

[84] Z. Ding, Y. Fu, Robust multiview data analysis through collective low-rank subspace, IEEE Transactions on Neural Networks and LearningSystems 29 (5) (2018) 1986–1997.

[85] Y. Kong, M. Shao, K. Li, Y. Fu, Probabilistic low-rank multitasklearning, IEEE Transactions on Neural Networks and Learning Systems29 (3) (2018) 670–680.

[86] F. Nie, H. Huang, Subspace clustering via new low-rank model withdiscrete group structure constraint., in: Proceedings of InternationalJoint Conference on Artificial Intelligence, 2016, pp. 1874–1880.

[87] G. Liu, Z. Zhang, Q. Liu, H. Xiong, Robust subspace clustering withcompressed data, IEEE Transactions on Image Processing 28 (10) (2019)5161–5170.

[88] M. Yin, J. Gao, Z. Lin, Laplacian regularized low-rank representa-tion and its applications, IEEE Transactions on Pattern Analysis andMachine Intelligence 38 (3) (2016) 504–517.

[89] Z. Zhang, Y. Xu, L. Shao, J. Yang, Discriminative block-diagonalrepresentation learning for image recognition, IEEE Transactions onNeural Networks and Learning Systems 29 (7) (2018) 3111–3125.

36

[90] C. Peng, Z. Kang, H. Li, Q. Cheng, Subspace clustering using log-determinant rank approximation, in: Proceedings of the ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining,ACM, 2015, pp. 925–934.

[91] Z. Zhang, F. Li, M. Zhao, L. Zhang, S. Yan, Robust neighborhoodpreserving projection by nuclear `2,1-norm regularization for imagefeature extraction, IEEE Transactions on Image Processing 26 (4)(2017) 1607–1622.

[92] F. Nie, Z. Hu, X. Li, An investigation for loss functions widely used inmachine learning, Communications in Information and Systems 18 (1)(2018) 37–52.

[93] C. Sagonas, Y. Panagakis, A. Leidinger, S. Zafeiriou, et al., Robustjoint and individual variance explained, in: Proceedings of the IEEEInternational Conference on Computer Vision Pattern Recognition,Vol. 2, 2017, p. 6.

[94] Y. Panagakis, M. A. Nicolaou, S. Zafeiriou, M. Pantic, Robust correlatedand individual component analysis, IEEE Transactions on PatternAnalysis and Machine Intelligence 38 (8) (2016) 1665–1678.

[95] W. Dong, G. Shi, X. Li, Y. Ma, F. Huang, Compressive sensing via non-local low-rank regularization, IEEE Transactions on Image Processing23 (8) (2014) 3618–3632.

[96] H. Gao, F. Nie, X. Li, H. Huang, Multi-view subspace clustering, in:Proceedings of the IEEE International Conference on Computer Vision,2015, pp. 4238–4246.

[97] M. Liu, Y. Luo, D. Tao, C. Xu, Y. Wen, Low-rank multi-view learning inmatrix completion for multi-label image classification, in: Proceedingsof the AAAI Conference on Artificial Intelligence, 2015, pp. 2778–2784.

[98] R. Xia, Y. Pan, L. Du, J. Yin, Robust multi-view spectral clusteringvia low-rank and sparse decomposition, in: Proceedings of the AAAIConference on Artificial Intelligence, 2014, pp. 2149–2155.

[99] Y. Wang, W. Zhang, L. Wu, X. Lin, M. Fang, S. Pan, Iterative viewsagreement: An iterative low-rank based structured optimization method

37

to multi-view spectral clustering, in: Proceedings of the InternationalJoint Conference on Artificial Intelligence, 2016, pp. 2153–2159.

[100] Z. Ding, Y. Fu, Dual low-rank decompositions for robust cross-viewlearning, IEEE Transactions on Image Processing 28 (1) (2018) 194–204.

[101] J. Li, Y. Wu, J. Zhao, K. Lu, Low-rank discriminant embedding formultiview learning, IEEE Transactions on Cybernetics 47 (11) (2017)3516–3529.

[102] Z. Ding, M. Shao, Y. Fu, Latent low-rank transfer subspace learning formissing modality recognition, in: Proceedings of the AAAI Conferenceon Artificial Intelligence, 2014, pp. 1192–1198.

[103] M. Shao, D. Kit, Y. Fu, Generalized transfer subspace learning throughlow-rank constraint, International Journal of Computer Vision 109 (1-2)(2014) 74–93.

[104] Y. Chen, A. Jalali, S. Sanghavi, H. Xu, Clustering partially observedgraphs via convex optimization, Journal of Machine Learning Research15 (1) (2014) 2213–2238.

[105] J. Ren, Z. Zhang, S. Li, Y. Wang, G. Liu, S. Yan, M. Wang, Learn-ing hybrid representation by robust dictionary learning in factorizedcompressed space, IEEE Transactions on Image Processing 29 (2020)3941–3956.

[106] Z. Huo, F. Nie, H. Huang, Robust and effective metric learning usingcapped trace norm: Metric learning via capped trace norm, in: Pro-ceedings of the ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, 2016, pp. 1605–1614.

[107] L. Wang, B. Wang, Z. Zhang, Q. Ye, L. Fu, G. Liu, M. Wang, Ro-bust auto-weighted projective low-rank and sparse recovery for visualrepresentation, Neural Networks 117 (2019) 201–215.

[108] N. Xue, J. Deng, S. Cheng, Y. Panagakis, S. Zafeiriou, Side informationfor face completion: a robust pca approach, IEEE Transactions onPattern Analysis and Machine Intelligence 41 (10) (2019) 2349–2364.

38

[109] N. Xue, Y. Panagakis, S. Zafeiriou, Side information in robust principalcomponent analysis: Algorithms and applications, in: Proceedings ofthe IEEE International Conference on Computer Vision, IEEE, 2017,pp. 4327–4335.

[110] C. Sagonas, et al, Robust statistical frontalization of human and animalfaces, International Journal of Computer Vision 122 (2) (2017) 270–291.

[111] C. Su, et al, Multi-task learning with low rank attribute embedding formulti-camera person re-identification, IEEE Transactions on PatternAnalysis and Machine Intelligence 40 (5) (2018) 1167–1181.

[112] Y. Sui, L. Zhang, Robust tracking via locally structured representation,International Journal of Computer Vision 119 (2) (2016) 110–144.

[113] Y. Chang, L. Yan, S. Zhong, Transformed low-rank model for linepattern noise removal, in: Proceedings of the IEEE International Con-ference on Computer Vision, 2017, pp. 1726–1734.

[114] T. Huang, W. Dong, X. Xie, G. Shi, X. Bai, Mixed noise removal vialaplacian scale mixture modeling and nonlocal low-rank approximation,IEEE Transactions on Image Processing 26 (7) (2017) 3171–3186.

[115] R. Wang, E. Trucco, Single-patch low-rank prior for non-pointwiseimpulse noise removal, in: Proceedings of the IEEE International Con-ference on Computer Vision, 2013, pp. 1073–1080.

[116] P. Kumar, R. R. Sahay, Accurate structure recovery via weighted nuclearnorm: A low rank approach to shape-from-focus, in: Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition, 2017,pp. 563–574.

[117] W. Ren, J. Tian, Z. Han, A. Chan, Y. Tang, Video desnowing andderaining based on matrix decomposition, in: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 2017, pp.4210–4219.

[118] J. Yang, et al, Nuclear norm based matrix regression with applicationsto face recognition with occlusion and illumination changes, IEEETransactions on Pattern Analysis and Machine Intelligence 39 (1) (2017)156–171.

39

[119] J. Lezama, Q. Qiu, G. Sapiro, Not afraid of the dark: Nir-vis facerecognition via cross-spectral hallucination and low-rank embedding, in:Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, IEEE, 2017, pp. 6807–6816.

[120] Y. Li, J. Liu, Z. Li, Y. Zhang, H. Lu, S. Ma, et al., Learning low-rankrepresentations with classwise block-diagonal structure for robust facerecognition., in: AAAI, 2014, pp. 2810–2816.

[121] T. H. Oh, J. Lee, Y. Tai, I. Kweon, Robust high dynamic range imagingby rank minimization, IEEE Transactions on Pattern Analysis andMachine Intelligence 37 (6) (2015) 1219–1232.

[122] H. Zhao, Z. Ding, Y. Fu, Pose-dependent low-rank embedding for headpose estimation., in: Proceedings of AAAI Conference on ArtificialIntelligence, 2016, pp. 1422–1428.

[123] M. Shakeri, H. Zhang, Moving object detection in time-lapse or motiontrigger image sequences using low-rank and invariant sparse decom-position, in: Proceedings of the IEEE International Conference onComputer Vision, 2017, pp. 5133–5141.

[124] B.-J. Han, J.-Y. Sim, Reflection removal using low-rank matrix comple-tion, in: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, Vol. 2, 2017.

[125] Z. Ding, M. Shao, Y. Fu, Low-rank embedded ensemble semantic dic-tionary for zero-shot learning, in: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2017, pp. 2050–2058.

[126] L. Zhu, C.-W. Fu, M. S. Brown, P.-A. Heng, A non-local low-rankframework for ultrasound speckle reduction, in: Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition, 2017,pp. 5650–5658.

[127] M. Li, J. Liu, Z. Xiong, X. Sun, Z. Guo, Marlow: A joint multiplanarautoregressive and low-rank approach for image completion, in: Pro-ceedings of the European Conference on Computer Vision, Springer,2016, pp. 819–834.

40

[128] X. Zhou, M. Zhu, K. Daniilidis, Multi-image matching via fast alternat-ing minimization, in: Proceedings of the IEEE International Conferenceon Computer Vision, 2015, pp. 4032–4040.

[129] C. Li, L. Lin, W. Zuo, S. Yan, J. Tang, Sold: Sub-optimal low-rankdecomposition for efficient video segmentation, in: Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition, 2015,pp. 5519–5527.

[130] X. Peng, S. Zhang, Y. Yu, D. N. Metaxas, Toward personalized modeling:Incremental and ensemble alignment for sequential faces in the wild,International Journal of Computer Vision 126 (2-4) (2018) 184–197.

[131] C. Sagonas, Y. Panagakis, S. Zafeiriou, M. Pantic, Raps: Robust andefficient automatic construction of person-specific deformable models, in:Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2014, pp. 1789–1796.

[132] Y. Peng, A. Ganesh, J. Wright, W. Xu, Y. Ma, RASL: robust alignmentby sparse and low-rank decomposition for linearly correlated images,IEEE Transactions on Pattern Analysis and Machine Intelligence 34 (11)(2012) 2233–2246.

[133] C. Zhao, W.-K. Cham, X. Wang, Joint face alignment with a genericdeformable face model, in: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, IEEE, 2011, pp. 561–568.

[134] J. Li, X. Chen, D. Zou, B. Gao, W. Teng, Conformal and low-ranksparse representation for image restoration, in: Proceedings of the IEEEInternational Conference on Computer Vision, 2015, pp. 235–243.

[135] R. Cabral, et al, Matrix completion for weakly-supervised multi-label im-age classification, IEEE Transactions on Pattern Analysis and MachineIntelligence 37 (1) (2015) 121–135.

[136] T. Zhang, B. Ghanem, S. Liu, C. Xu, N. Ahuja, Low-rank sparse codingfor image classification, in: Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2013, pp. 281–288.

[137] Z. Zhang, F. Li, M. Zhao, L. Zhang, S. Yan, Joint low-rank and sparseprincipal feature coding for enhanced robust representation and visual

41

classification, IEEE Transactions on Image Processing 25 (6) (2016)2429–2443.

[138] X. Zhu, X.-Y. Jing, F. Wu, D. Wu, L. Cheng, S. Li, R. Hu, Multi-kernel low-rank dictionary pair learning for multiple features basedimage classification., in: Proceedings of AAAI Conference on ArtificialIntelligence, 2017, pp. 2970–2976.

[139] Z. Jiang, P. Guo, L. Peng, Locality-constrained low-rank coding for im-age classification, in: Proceedings of the AAAI Conference on ArtificialIntelligence, 2014, pp. 2780–2786.

[140] Y. Zhang, Z. Jiang, L. S. Davis, Learning structured low-rank represen-tations for image classification, in: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2013, pp. 676–683.

[141] X. Cheng, S. Sridharan, J. Saragih, S. Lucey, Rank minimization acrossappearance and shape for aam ensemble fitting, in: Proceedings of theIEEE International Conference on Computer Vision, 2013, pp. 577–584.

[142] B. Cheng, G. Liu, J. Wang, Z. Huang, S. Yan, Multi-task low-rankaffinity pursuit for image segmentation, in: Proceedings of the IEEEInternational Conference on Computer Vision, IEEE, 2011, pp. 2439–2446.

[143] G. Liu, S. Yan, Latent low-rank representation for subspace segmenta-tion and feature extraction, in: Proceedings of the IEEE InternationalConference on Computer Vision, IEEE, 2011, pp. 1615–1622.

[144] Q. Yao, J. T. Kwok, Colorization by patch-based local low-rank ma-trix completion, in: Proceedings of AAAI Conference on ArtificialIntelligence, 2015, pp. 1959–1965.

[145] S. Wang, Z. Zhang, Colorization by matrix completion, in: Proceedingsof the AAAI Conference on Artificial Intelligence, 2012, pp. 626–633.

[146] H. Fan, Y. Luo, L. Qi, N. Wang, J. Dong, H. Yu, Robust photometricstereo in a scattering medium via low-rank matrix completion andrecovery, in: Proceedings of the International Conference on HumanSystem Interactions, 2016, pp. 323–329.

42

[147] Z. Zhang, A. Ganesh, X. Liang, Y. Ma, TILT: transform invariantlow-rank textures, International Journal of Computer Vision 99 (1)(2012) 1–24.

[148] C. Georgakis, Y. Panagakis, M. Pantic, Dynamic behavior analysisvia structured rank minimization, International Journal of ComputerVision 126 (2-4) (2018) 333–357.

[149] S. Tulyakov, X. Alameda-Pineda, E. Ricci, L. Yin, J. F. Cohn, N. Sebe,Self-adaptive matrix completion for heart rate estimation from facevideos under realistic conditions, in: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2016, pp. 2396–2404.

[150] L. Shi, Sparse additive text models with low rank background, in:Proceedings of the Advances in Neural Information Processing Systems,2013, pp. 172–180.

[151] Y. Pan, H. Lai, C. Liu, Y. Tang, S. Yan, Rank aggregation via low-rankand structured-sparse decomposition, in: Proceedings of the AAAIConference on Artificial Intelligence, 2013.

[152] Z. Ding, M. Shao, Y. Fu, Deep robust encoder through locality preserv-ing low-rank dictionary, in: Proceedings of the European ConferenceEu-ropean Conference Computer Vision, 2016, pp. 567–582.

[153] J. Lezama, Q. Qiu, P. Muse, G. Sapiro, Ole: Orthogonal low-rankembedding, a plug and play geometric loss for deep learning, in: Pro-ceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2018, pp. 8109–8118.

[154] Z. Ding, Y. Fu, Deep transfer low-rank coding for cross-domain learning,IEEE Transactions on Neural Networks and Learning Systems 30 (6)(2018) 1768–1779.

[155] X. Zhou, C. Yang, W. Yu, Moving object detection by detecting con-tiguous outliers in the low-rank representation, IEEE Transactions onPattern Analysis and Machine Intelligence 35 (3) (2013) 597–610.

[156] T. Bouwmans, et al, Decomposition into low-rank plus additive matricesfor background/foreground separation: A review for a comparative

43

evaluation with a large-scale dataset, Computer Science Review 23(2017) 1–71.

[157] Y. Koren, R. Bell, C. Volinsky, Matrix factorization techniques forrecommender systems, Computer 42 (8) (2009) 30–37.

[158] J. Li, M. Jing, K. Lu, L. Zhu, Y. Yang, Z. Huang, From zero-shotlearning to cold-start recommendation, in: Proceedings of the AAAIConference on Artificial Intelligence, Vol. 33, 2019, pp. 4189–4196.

[159] Y. Chi, Y. M. Lu, Y. Chen, Nonconvex optimization meets low-rank ma-trix factorization: An overview, IEEE Transactions on Signal Processing67 (20) (2019) 5239–5269.

[160] K. Dabov, A. Foi, V. Katkovnik, K. O. Egiazarian, Image denoising bysparse 3-d transform-domain collaborative filtering, IEEE Transactionson Image Processing 16 (8) (2007) 2080–2095.

[161] S. Wang, L. Zhang, Y. Liang, Nonlocal spectral prior model for low-levelvision, in: Proceedings of the Asian Conference on Computer Visio,2012, pp. 231–244.

[162] N. Yair, T. Michaeli, Multi-scale weighted nuclear norm image restora-tion, in: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition, 2018, pp. 3165–3174.

[163] J. Xu, L. Zhang, D. Zhang, X. Feng, Multi-channel weighted nuclearnorm minimization for real color image denoising, in: Proceedings ofthe IEEE International Conference on Computer Vision, Vol. 2, 2017.

[164] A. Agudo, F. Moreno-Noguer, Dust: Dual union of spatio-temporal sub-spaces for monocular multiple object 3d reconstruction, in: Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition,Vol. 1, 2017, p. 2.

[165] S. Kumar, et al, Spatio-temporal union of subspaces for multi-bodynon-rigid structure-from-motion, Pattern Recognition 71 (2017) 428–443.

[166] Y. Xu, Y. Li, S. Zhang, W. Wen, B. Wang, Y. Qi, Y. Chen, W. Lin,H. Xiong, Trained rank pruning for efficient deep neural networks, arXivpreprint arXiv:1812.02402.

44

[167] X. Piao, Y. Hu, J. Gao, Y. Sun, B. Yin, Double nuclear norm basedlow rank representation on grassmann manifolds for clustering, in:Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2019, pp. 12075–12084.

[168] Y. Zhong, P. Ji, J. Wang, Y. Dai, H. Li, Unsupervised deep epipolarflow for stationary or dynamic scenes, in: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 2019, pp.12095–12104.

[169] W. Zhu, Q. Qiu, B. Wang, J. Lu, G. Sapiro, I. Daubechies, Stopmemorizing: A data-dependent regularization framework for intrinsicpattern learning, SIAM Journal on Mathematics of Data Science 1 (3)(2019) 476–496.

[170] X. Zhang, J. Zou, X. Ming, K. He, J. Sun, Efficient and accurateapproximations of nonlinear convolutional networks, in: Proceedingsof the IEEE Conference on Computer Vision and pattern Recognition,2015, pp. 1984–1992.

[171] A. Sanyal, P. H. Torr, P. K. Dokania, Stable rank normalization forimproved generalization in neural networks and gans, in: Proceedingsof International Conference on Learning Representations, 2019.

[172] T. Zhang, et al, Robust visual tracking via consistent low-rank sparselearning, International Journal of Computer Vision 111 (2) (2015) 171–190.

[173] Y.-L. Chen, C.-T. Hsu, A generalized low-rank appearance model forspatio-temporally correlated rain streaks, in: Proceedings of the IEEEInternational Conference on Computer Vision, 2013, pp. 1968–1975.

[174] Z. Li, S. Yang, L.-F. Cheong, K.-C. Toh, Simultaneous clusteringand model selection for tensor affinities, in: Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, 2016, pp.5347–5355.

[175] C. Zhang, H. Fu, S. Liu, G. Liu, X. Cao, Low-rank tensor constrainedmultiview subspace clustering, in: Proceedings of the IEEE InternationalConference on Computer Vision, 2015, pp. 1582–1590.

45

[176] T. Yokota, B. Erem, S. Guler, S. K. Warfield, H. Hontani, Missingslice recovery for tensors using a low-rank model in embedded space, in:Proceedings of the IEEE Conference on Computer Vision and PatternRecognition, 2018, pp. 8251–8259.

[177] C.-Y. Ko, K. Batselier, L. Daniel, W. Yu, N. Wong, Fast and accuratetensor completion with total variation regularized tensor trains, IEEETransactions on Image Processing.

[178] A. Zare, A. Ozdemir, M. A. Iwen, S. Aviyente, Extension of pca to higherorder data structures: An introduction to tensors, tensor decompositions,and tensor pca, Proceedings of the IEEE (99).

[179] T. Yokota, H. Hontani, Simultaneous visual data completion and de-noising based on tensor rank and total variation minimization and itsprimal-dual splitting algorithm, in: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2017, pp. 3732–3740.

[180] C. Lu, J. Feng, Y. Chen, W. Liu, Z. Lin, S. Yan, Tensor robust principalcomponent analysis with a new tensor nuclear norm, IEEE Transactionson Pattern Analysis and Machine Intelligence 42 (4) (2019) 925–938.

[181] P. Zhou, C. Lu, J. Feng, Z. Lin, S. Yan, Tensor low-rank representationfor data recovery and clustering, IEEE Transactions on Pattern Analysisand Machine Intelligence.

[182] C. Lu, J. Feng, Z. Lin, S. Yan, Exact low tubal rank tensor recoveryfrom gaussian measurements, in: Proceedings of the International JointConference on Artificial Intelligence, 2018.

[183] T.-Y. Ji, T.-Z. Huang, X.-L. Zhao, T.-H. Ma, L.-J. Deng, A non-convextensor rank approximation for tensor completion, Applied MathematicalModelling 48 (2017) 410–422.

[184] Q. Yao, J. T.-Y. Kwok, B. Han, Efficient nonconvex regularized tensorcompletion with structure-aware proximal iterations, in: InternationalConference on Machine Learning, 2019, pp. 7035–7044.

46

journal of la low rank regularization: a review · journal of latex class files, vol. 14, no. 8,...

Documents