submitted to ieee transactions on geoscience and …index terms—satellite video processing, moving...

SUBMITTED TO IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1

Online Structured Sparsity-based Moving ObjectDetection from Satellite Videos

Junpeng Zhang, Student Member, IEEE, Xiuping Jia, Senior Member, IEEE, Jiankun Hu, Senior Member,IEEE and Jocelyn Chanussot, Fellow, IEEE

Abstract—Inspired by the recent developments in computervision, low-rank and structured sparse matrix decompositioncan be potentially be used for extract moving objects in satellitevideos. This set of approaches seeks for rank minimization onthe background that typically requires batch-based optimizationover a sequence of frames, which causes delays in processingand limits their applications. To remedy this delay, we proposean Online Low-rank and Structured Sparse Decomposition (O-LSD). O-LSD reformulates the batch-based low-rank matrixdecomposition with the structured sparse penalty to its equivalentframe-wise separable counterpart, which then defines a stochas-tic optimization problem for online subspace basis estimation.In order to promote online processing, O-LSD conducts theforeground and background separation and the subspace basisupdate alternatingly for every frame in a video. We also show theconvergence of O-LSD theoretically. Experimental results on twosatellite videos demonstrate the performance of O-LSD in termof accuracy and time consumption is comparable with the batch-based approaches with significantly reduced delay in processing.

Index Terms—Satellite Video Processing, Moving Object De-tection, Online Robust Principle Component Analysis, StructuredSparsity-Inducing Norm, Background Subtraction

I. INTRODUCTION

OBJECT detection on high resolution aerial images hasbeen actively investigated in recent years [1], [2]. In-

spired by the state-of-the-art Deep Learning methods, such asRegional Convolution Neural Network (R-CNN) [3], Fast R-CNN [4], Faster R-CNN [5], You Only Look Once (YOLO) [6]and Single Shot MultiBox Detector (SSD) [7], object detectionperformance on these images has been improved significantly[8], [9], [10], [11]. These approaches are mainly exploring thespectral (or color) and spatial (texture or context) informationon objects of interest, they detect objects of interest imageby image, as none temporal information is available on thoseimages. Recently, with satellite videos captured by Jilin-1[12] and Skybox [13], dense temporal information becomesavailable, which benefits moving objects detection from space.Target tracking becomes possible and can then be conductedfor various applications [14], [15], [16], [17].

Detecting moving objects in a video is achieved by separat-ing the temporal varying foreground, which is associated withthe moving objects, and the background that lays in a lowdimensional subspace from a video [18], [19], [20]. Giventhe moving objects account for a limited number of pixelsin the foreground, the foreground is assumed sparse. RobustPrinciple Component Analysis (RPCA), as one fundamentalmethod in foreground extraction, defines a low-rank matrixdecomposition problem with a sparse penalty [21], which is

solved by Principle Component Pursuit (RPCA-PCP) [22],[21], [23] and Fast Low Rank Approximation (GoDec) [24].Based on the duality between sparsity and Laplace distribution,Probabilistic Robust Matrix Factorization (PRMF) provides aprobabilistic interpretation to RPCA by combining Laplaceerror and Gaussian prior [25].

As a moving object is commonly a set of neighboringpixels, spatial prior on the foreground is considered in low-rank matrix decomposition to improve moving object detectionperformance. Total Variation (TV) regularization is introducedto enforce smoothness on the foreground in the matrix decom-position [26]. DEtecting Contiguous Outliers in the LOw-rankRepresentation (DECOLOR) constrains the edges of movingobjects to be contiguous, then first-order Markov RandomField (MRF) is integrated into low-rank matrix decomposition[27]. Another possible spatial prior on the foreground is thesparsity over groups of spatial neighboring pixels on theforeground other than pixel-wise sparse, which is measuredby Structured Sparsity-Inducing Norm [28]. Low-rank andStructured Sparse Decomposition (LSD) obeys this prior andpenalizes the low-rank matrix decomposition by the structuredsparsity-inducing norm of the foreground [29]. As movingobject detection in satellite video is more sensitive to randomnoises, integrating spatial prior should improve the quality ofthe estimated foreground, thus the moving object detectionperformance. By integrating structured sparsity, LSD presentsboosted Moving Object Detection (MOD) performance insatellite videos [30]. DECOLOR, however, has a limitedimprovement in MOD performance for satellite videos, as theintroduced MRF constraint tends to merge neighboring targetswhen the distance between them is too small.

Regardless of the detection performance of the algorithmsabove, a pitfall of them is that their solutions are based onoptimization in a batch manner. These approaches use Sin-gular Value Decomposition (SVD) for low-rank backgroundestimation, which couples all the samples in each iterationof the optimization. The detection results are not availableuntil the optimization terminates, which results in delays inprocessing. Another shortcoming of batch-based approachesis the difficulty in handling a video with an incrementallength. Both issues of the batch-based algorithms limit theirapplication in various online systems.

In order to reduce the delay and to make MOD adaptableto videos of incremental length, online method is expectedto sequentially estimate foreground and background for eachnew incoming frame. By low-rank matrix decomposition,the estimated Principal Component basis vectors represent a

arX

iv:1

911.

1298

9v3

[cs

.CV

] 1

0 D

ec 2

019

2 SUBMITTED TO IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

TABLE ICOMPARISON ON ONLINE LOW-RANK MATRIX DECOMPOSITION ALGORITHMS FOR MOVING OBJECT DETECTION

Method Objective Function Constraints Spatial Prior Optimization Scheme ProvenConvergence

OPRMF [25]minλ ‖D− LR‖1

+λ12‖L‖2F + λ2

2‖R‖2F

Lij |λ1 ∼ N(Lij |0, λ−11 ),

Rij |λ2 ∼ N(Rij |0, λ−12 )

- Online ExpectationMaximization No

GRASTA [31] min ‖s‖1d = Lr+ s,LTL = I

-Incremental GradientDescent Method on

Grassmannian ManifoldNo

OR-PCA [32]min 1

2‖D− LR− S‖2F

+λ12‖L‖2F + λ1

2‖R‖2F

+λ2 ‖S‖1- - Stochastic

Optimization Yes

COROLA [33]

min 12‖PS⊥ (D− LR)‖2F

+α2‖PS⊥ (L)‖

2F

+α2‖PS⊥ (R)‖2F

+β ‖S‖1 + γ ‖Avec(S)‖

Sij ∈ 0, 1Edge

ContiguousnessStochastic

Optimization Yes

GOSUS [34] min∑li=1 µi ‖Gs‖2

+λ2‖Lr+ s− d‖22

LTL = IStructured Sparsity

of Foreground

Incremental GradientDescent Method on

Grassmannian ManifoldNo

Proposed O-LSDmin 1

2‖D− LR− S‖2F

+λ12‖L‖2F + λ1

2‖R‖2F

+λ2∑

s∈S ‖s‖`1/`∞- Structured Sparsity

of ForegroundStochastic

Optimization Yes

∗There exist a variety of low-rank decomposition algorithm in the literature, however, we select the most related works here. Interested readersmay refer to [18], [19], [20], [35] for more comprehensive reviews in low-rank matrix decomposition and their applications in video processing.

subspace, where the background lays. This subspace can beidentified by a point on the Grassmannian manifold, and theincremental gradient descent method on Grassmannian Mani-fold is employed for online subspace tracking or updating [36],[37], [38]. For moving object detection, Grassmannian Ro-bust Adaptive Subspace Tracking Algorithm (GRASTA) [31]and Grassmannian Online Subspace Updates with Structured-sparsity (GOSUS) [34] are developed for online low-rankmatrix decomposition with the pixel-wise sparse penalty andthe structured sparse penalty, respectively. These set of ap-proaches, however, provide no theoretical guarantee on theirconvergence, and their performance is heavily sensitive to theselection of the learning rate.

Another possibility for online low-rank matrix decomposi-tion is the increasingly common matrix factorization approxi-mation of nuclear norm, where rank-minimization is replacedby the sum of square penalties of its factorization [39], [32],[40], [33], [41]. With this reformulation, iterative optimizationscheme is then developed based on stochastic optimizationfor solving low-rank decomposition problem online. OnlineRobust Principle Component Analysis (OR-PCA) solves theonline low-rank matrix decomposition problem with pixel-wise sparsity penalty, which, more importantly, proves thatiterative optimization algorithm converges to the global op-timum of the original RPCA approach [32]. Utilizing first-order Markov Random Field, spatial prior on contiguous edgesis also integrated with this reformulation for online low-rankmatrix decomposition [33].

It has been observed that, by introducing the structuredsparse penalty, LSD boosts the moving object detection perfor-mance in satellite videos [30]. While GOSUS combines onlinelow-dimensional subspace tracking with structured sparsity,no theoretical guarantee on its convergence is provided. Tothe best of our knowledge, there exists a gap between onlinealgorithm with theoretically guaranteed convergence and the

one with the structured sparsity penalty, as demonstratedin Table. I. In order to fill this gap, we present an on-line low-rank matrix decomposition approach with structuredsparse penalty, named as Online Low-rank and StructuredSparse Decomposition (O-LSD), which not only combinesthe structured sparsity penalty but also provides theoreticallyguaranteed convergence. We follow the matrix factorizationapproximation of nuclear norm for online learning in [32],[40], [33], [41], and decompose the background matrix to a setof background frames that are reconstructed by the estimatedsubspace basis and their associated coefficients. To promoteonline processing, the proposed O-LSD algorithm is composedof two building blocks. For each frame, its correspondingforeground and background frames are reconstructed by thecurrent subspace basis, then the subspace basis is updated forthis new input. This procedure defines a stochastic optimiza-tion problem, and we show that O-LSD algorithm convergesalmost surely. Existing convergence analysis in [42], [32],[41], [43] is built on necessary and sufficient conditions onthe unique solution in sparse encoding. In O-LSD, no suchconditions exist for structured sparsity encoding to the bestof our knowledge, and we show the convergence of O-LSDbased on the boundedness of the sub-gradients in structuredsparsity encoding, then a set of related properties of O-LSD are demonstrated. Experimental evaluations and analysiswere performed on a satellite video dataset with two videos,where we compared our algorithm with five state-of-the-artalgorithms.

In summary, the main contributions of this work are four-fold:

1) We propose an Online Low-rank and Structured SparseDecomposition (O-LSD) for moving object detection insatellite videos by reformulating batch-based LSD usingthe matrix factorization approximation of nuclear norm.

2) To solve the new reformulated optimization problem,

ONLINE STRUCTURED SPARSITY-BASED MOVING OBJECT DETECTION FROM SATELLITE VIDEOS 3

two iterating steps are designed and developed. For eachframe, the corresponding foreground and backgroundframes are first reconstructed by the current subspacebasis, then the subspace basis is updated by the givenframe.

3) We show that O-LSD converges almost surely. In con-trary to most current online algorithms, we show thatO-LSD can converge without meeting the conditionsfor the unique solution of structured sparsity encoding.Due to the lack of these conditions, the solution of O-LSD can converge to neither a stationary point nor theglobal optimum of its batch-based counterpart LSD. Thisfinding and its corresponding proof are useful beyondthe scope of this paper.

4) Due to the better convergence characteristics of O-LSD,it can further reduce its processing delay with negligibleeffects on the detection performance, by down-samplingin the temporal domain.

The remainder of this paper is organized as follows. Theproposed O-LSD is presented in Section II, where its conver-gence analysis is provided in Section II-E. The experimentalparameter settings and performance comparison against state-of-the-art approaches are presented in Section III. Finally,conclusions and suggestions for future research are given inSection IV.

II. PROPOSED METHOD

A. Matrix Factorization to LSD

Low-rank and Structured Sparse Decomposition (LSD)seeks for a low-rank matrix decomposition of an observationmatrix, which at the same time imposes the structured sparsepenalty on the foreground. Given a fixed-length sequencefrom n video frames and each frame contains p pixels, LSDdecomposes its corresponding matrix D ∈ Rp×n to a low-rank background matrix B ∈ Rp×n plus a structured sparseforeground matrix S ∈ Rp×n, and defines a batch-basedoptimization problems as

(B∗,S∗) = arg minB,S

‖B‖∗ + λΩ(S)

s. t.D = B + S, (1)

where Ω(S) refers to the structured sparsity-inducing normof S and λ is a scalar that assigns the weight of structuredsparsity. The structured sparsity-inducing norm [28], [44], [45]indicates the sparsity over groups of neighboring pixels as

Ω(S) =∑s∈S

‖s‖`1/`∞ =∑s∈S

∑g∈G

ηg∥∥s|g∥∥∞, (2)

where G defines the set of groups of neighboring pixels, ands|g ∈ Rp is a sparse vector with non-zero elements at theindices represented in a group g ∈ G. ηg specifies the weightfor a group of the pixels. In this paper, we assume each groupcontributes equally and assign 1.0 to it, ηg = 1.0,∀g ∈ G. InLSD, no temporal prior or constraints on the foreground areconsidered, thus the structured sparse penalty over a sequenceof frames is frame-wise separable. In this paper, the groups ofspatially related pixels G is constructed by 3×3 grid scanning

over the foreground, as the moving targets in satellite videosare usually in small scales.

Inspired by [32], the equality constraint in Eq. (1) isremoved, and we obtain a reformulated optimization problem

(B∗,S∗) = arg minB,S

λ1 ‖B‖∗ + λ2∑s∈S

‖s‖`1/`∞

+1

2‖D−B− S‖2F ,

(3)

in which λ1 > 0 and λ2 > 0 are the corresponding weightsfor the low-rank penalty and the structured sparsity penalty.

Guided by the trending reformulation by matrix factoriza-tion in [32], [40], [35], we replace the low-rank term ‖B‖∗ inEq. (3) by its approximation, which makes use of the followinglemma.

Lemma II.1. Given that B is factorized as B = LR, L ∈Rp∗r,R ∈ Rr∗n, the nuclear norm of B is upper bounded bythe sum of Frobenius norms of L and R, as

‖B‖∗ = infL∈Rp×r,R∈Rr×n

1

2‖L‖2F +

1

2‖R‖2F : B = LR

(4)

When r > Rank(B), the jointly non-convex quadratic opti-mization problem Eq. (4) is equivalent to minimize the nuclearnorm of B. 1

By substituting ‖B‖∗ with its factorized approximation, werewrite the optimization problem in Eq. (3) to

(L∗,R∗,S∗) = arg minL,R,S

1

2‖D− LR− S‖2F +

λ12‖L‖2F

+λ12‖R‖2F + λ2

∑s∈S

‖s‖`1/`∞ ,

(5)where L ∈ Rp∗r is considered as the subspace basis of thebackground matrix B, and R ∈ Rr∗n is the coefficients toreconstruct B with given L. r is the estimated dimension ofthe subspace that the background frames lay in.

With a pair of estimated L and R, each column vector in Rcorresponds to an estimated background frame in B. Let dt, rtand st refer to the t-th column of D, R and S respectively, thisoptimization problem in Eq. (5) is equivalent to minimizingan empirical cost function

fn(L) =1

n

n∑i=1

`(Di,L) +λ12n‖L‖2F , (6)

in which `(di,L) is the reconstruction cost evaluated withfixed L by

`(d,L) = minr,s

ˆ(d,L, r, s),

ˆ(d,L, r, s) =1

2‖d− Lr− s‖22 +

λ12‖r‖22 + λ2 ‖s‖`1/`∞ .

(7)Minimization of the empirical cost function in Eq. (6) asso-ciates the sum of reconstruction costs `(·,L), where, for eachframe, (r, s) is optimized with the optimization target L as a

1Please refer to [39] for detailed proof.


parameter, whose formulation fits to the max-min optimizationproblem.

B. Online LSD

Through the above reformulation, it is still impossible toupdate L without re-estimating all pairs of (r, s), whichobstructs processing in an online fashion. In order to promoteonline processing, we propose an online algorithm, namedOnline Low-Rank and Structured Sparse Decomposition (O-LSD), where Foreground and Background Separation andSubspace Basis Update are sequentially conducted for eachframe.

O-LSD is an online algorithm that processes an input frameat each time instance in an online manner. At each timeinstance t, we have obtained Lt−1 estimated from previoustime instance t − 1. The foreground frame and backgroundframe are separated by solving the following optimizationproblem

(r∗t , s∗t ) = arg min

rt,st

ˆ(dt,Lt−1, rt, st). (8)

We term this procedure as Foreground and Background Sep-aration, which is detailed Section II-C.

Then subspace basis updating is performed with all pair ofri and si, i ∈ 1, · · · , t−1. Directly minimizing the empiricalcost function defined in Eq. (6) requires re-estimations on allpairs of ri and si. Instead, the subspace basis Lt is updated byminimizing a surrogate function of the empirical cost functiongt(Lt), which provides an upper bound for ft(Lt) so thatgt(Lt) > ft(Lt). We define the surrogate function as

gt(Lt) =1

t

∞∑t=1

ˆ(dt,Lt, rt, st) +λ12t‖Lt‖2F

=1

t

t∑i=1

(1

2‖di − Ltri − si‖22 +

λ12‖ri‖22

+ λ2 ‖si‖`1/`∞) +λ12t‖Lt‖2F .

(9)

The minimization of gt(Lt) with respect to Lt is termedas Subspace Basis Update, which is then explained in Sec-tion II-D. The entire O-LSD algorithm is summarized inAlgorithm 1.

C. Foreground and Background Separation

Foreground and Background Separation obtains a pair of r∗tand s∗t by solving the optimization problem defined in Eq. (8),where Lt−1 is provided by the previous time instance t − 1.As [L I]T [L I] is always positive semi-definite, the objectivefunction of Eq. (8) is convex with respect to (rt, st). Forsolving this convex optimization problem, instead of solving rtand st together, we adopt a Block Coordinate Descent (BCD)method [46], where rt and st are alternatingly updated byfixing each other.

By fixing st, r∗t is obtained by solving

r∗t = arg minrt

1

2‖dt − Lt−1rt − st‖22 +

λ12‖rt‖22 , (10)

Algorithm 1 Proposed O-LSD Algorithm for MODInput: dt ∈ Rp, Lt−1 ∈ Rp×r, At−1 and Bt−1Output: bt, rt, st and Lt

1: Separate the foreground and background:

(r∗t , s∗t ) = arg min

rt,st

1

2‖dt − Lt−1rt − st‖22

+λ12‖rt‖22 + λ2 ‖st‖`1/`∞ ,

which is solved by Algorithm 2.2: Compute the background frame: bt = Lt−1rt.3: Update the accumulation matrices At and Bt by Eq. (17).4: Update the subspace basis Lt:

L∗t = arg minLt

Tr(LTt (λ1I + At)Lt)− 2 Tr(LTt Bt),

whose solution is presented in Algorithm 3.5: return bt, rt, st and Lt.

which constructs a least-square problem, and its closed-formsolution is given by

r∗t = (LTt−1Lt−1 + λ1I)−1(dt − st). (11)

Using fixed rt, let u = dt − Lt−1rt, then the sub-problemfor estimating the structured sparsity st is defined as

s∗t = arg minst

1

2‖u− st‖22 + λ2 ‖st‖`1/`∞ , (12)

whose solution is obtained by its dual problem that define aQuadratic Min-Cost Flow problem [42], [47], [48] as

ξ∗ = arg minξ

1

2

∥∥∥∥∥∥u−∑g∈G

ξg

∥∥∥∥∥∥2

2

s.t. ∀g ∈ G, ‖ξg‖1 ≤ λ2 and ξgj = 0 if j /∈ g

,

(13)where ξg ∈ Rp,∀g ∈ G denotes the corresponding dualvariables for the group of variables in g, and ξ is the set ofall ξg,∀g ∈ G. The primal solution st is then obtained by

s∗t = u−∑g∈G

ξ∗g. (14)

The iteration of alternatingly estimation of rt and st con-tinues until the stop criterion is reached:

max‖r′t − r′′t ‖2 , ‖s′t − s′′t ‖2p

≤ τ, (15)

where (r′t, s′t) and (r′′t , s

′′t ) are two pairs of estimation so-

lutions at two consecutive iterations. Similar to [33], [35],the stop criterion is set as τ = 1.0e−5 in this paper. TheBCD algorithm for Foreground and Background Separation issummarized in Algorithm 2.


Algorithm 2 Block Coordinate Descent Method for Fore-ground and Background SeparationInput: Lt−1 ∈ Rp∗r, λ1 > 0 and λ2 > 0Output: rt and st

1: rt = 0, st = 0.2: while not converged do3: Estimate rt:

rt = (LTt−1Lt−1 + λ1I)−1(dt − st).

4: Estimate st by solving its dual problem on ξ defined inEq. (13), then

st = dt − Lt−1rt −∑g∈G

ξ∗g.

5: Check the convergence using Eq. (15).6: end while7: return rt and st

D. Subspace Basis Update

After estimating (rt, st), the subspace basis Lt is updated byminimizing the surrogate function of empirical cost functiongt(Lt), which defines a optimization problem as

L∗t = arg minLt

gt(Lt)

= arg minLt

Tr(LTt (λ1I + At)Lt)− 2 Tr(LTt Bt),(16)

in which Tr(·) denotes the trace of a matrix, and At and Bt

are two auxiliary accumulation matrices that are introduced toremove duplicated calculations at each time instance,

At = At−1 + rtrTt

Bt = At−1 + (dt − st)rTt

. (17)

Similar to [32], [33], [41], the optimization problem defined inEq. (16) is solved by a Block Coordinate Descent Method foravoiding matrix inverse of large matrix. The Subspace BasisUpdate algorithm is illustrated in Algorithm 3 .

Algorithm 3 Block Coordinate Descent Method for SubspaceBasis UpdateInput: Lt−1 = [l1, · · · , lr] ∈ Rp∗r, At = [a1, · · · ,ar] ∈

Rr∗r, Bt = [b1, · · · ,br] ∈ Rp∗r and λ1 > 0Output: Lt

1: A = At + λ1I.2: for i = 1 to r do3: li = 1

Ai,i(bi − Lt−1ai) + li.

4: end for5: Lt = [l1, · · · , lr].6: return Lt

The subspace basis L0 is initialized before starting O-LSD.L0 can be initialized by either the first a few frames in thegiven sequence or their Principal Components [35], [34] . Insatellite videos, moving objects move slowly, and choosingthese initialization scheme risks including the slow movingforeground objects into the background, which thus influencesthe detection performance, or a more extended sequence for

initialization is required. Such a satellite video with adequatelength is, however, not available technically yet. Therefore,in satellite videos, we recommend initializing L0 by randomvalues instead, which performs pretty well in practice.

E. Convergence Analysis

One technical contribution of this paper is to present theproposed O-LSD algorithm converges almost surely undermild condition.

Assumption 1. The observed data are uniformly bounded, andeach data is independent.

As a widely-used assumption [42], [32], [41], this assump-tion on the boundedness of observation data is quiet naturalfor real videos. Based the above assumption, we present ourfirst conclusion on the convergence of the surrogate functiongt(Lt).

Theorem II.2. Let Lt∞t=1 be the sequence of solution ob-tained by Algorithm 1, the surrogate function gt(Lt) convergesalmost surely.

Similarly, we obtain the convergence of two solutionsobtained at two consecutive time instance by Algorithm 1.

Theorem II.3. For two solutions produced by Algorithm 1 attwo consecutive time instances, ‖Lt − Lt+1‖F = O( 1

t ).

Then, we analyze the gap between the empirical costfunction ft(Lt) and its surrogate function gt(Lt) with theestimated Lt.

Theorem II.4. Note ft(L) is the empirical cost functiondefined in Eq. (6), and gt(L) is its surrogate function definedin Eq. (9). Lt is the solution obtained by Algorithm 1, when ttends to infinity, gt(Lt)−ft(Lt) converges to 0 almost surely.

In stochastic optimization, the expected cost function overL is defined as

f(L) = Ed[`(d,L)] = limt→∞

ft(L), (18)

then we present the convergence of the gap between the expectcost function f(Lt) and the surrogate function gt(Lt).

Theorem II.5. As t tends to infinity, given the Lt is obtainedby Algorithm 1, gt(Lt)− f(Lt) converges to 0 almost surely.

Furthermore, the solution Lt obtained by Algorithm 1 isnot a stationary point of expected cost function f(L), whent tends to infinity, which on contrary is proved true in [42],[32], [41]. Due to the existence of more than one solutions toEq. (8), ˆ(d,L, r, s) is no longer strictly convex (or stronglyconvex) with respect to (r, s), and the gradient of the expectedcost function ∇Lf(L) is no longer Lipschitz. Therefore, thegradient of the expected cost function ∇Lf(L) would notbecome zero when t tends to infinity, based on which weconclude the solution Lt may not be the stationary point ofthe expected cost function as t tends infinity.

Please refer to the appendices for detailed proofs of thepresented theorems.


III. EXPERIMENTS

The detection performance of O-LSD was evaluated on adataset of two satellite videos. This dataset is constructed froma satellite video captured over Las Vagas, USA on March25, 2014, whose spatial resolution is 1.0 meter and the framerate is 30 frames per second. Both videos contains 700 framewith boundary boxes for moving vehicles as groundtruth,and details on both videos are listed in Table. II 2. In thispaper, we used the first 200 frames in each video for thediscussion on parameter selection, and the remaining frameswere utilized for performance evaluation against existing state-of-the-art methods.

TABLE IIINFORMATION ON THE EVALUATION DATASETS

Video Frame Size Cross Validation Performance Evaluation#Frames #Vehicles #Frames #Vehicles

001 400× 400 200 9306 500 18167002 600× 400 200 13443 500 39362

The detection performance on moving object detection isevaluated on recall, precision and F1 scores given by

recall = TP/(TP + FN)

precision = TP/(TP + FP )

F1 =2× recall× precision

recall + precision

, (19)

where TP denotes the number of correct detections, FNand FP are the numbers of missed detections and falsealarms, respectively. In this paper, we define a correct de-tection with maximum Intersection over Union (IoU) againstthe groundtruth greater than a threshold. To complement thevehicles in small size in satellite videos, the threshold is set as0.3 3. In this paper, we refer 5-Frame detection performanceat each time instance to the metrics obtained from its 5 latestframes, which is used for observing the convergence, and theaccumulated detection performance is measured on all framesbefore current time instances, which is used for comparing theoverall performance over a sequence.

A. Parameter Setting

The performance of O-LSD is controlled by the dimensionof the estimated subspace r, and two weights for the low-rank term and the structured sparsity penalty separably, λ1and λ2. The following experiments are conducted on the cross-validation sequence from Video 001.

The weight λ1 assigns the importance of the low-ranksubspace term. With the fixed λ2, a significantly small λ1would encourage more information encoded in the low-ranksubspace factors and hurt the detection performance in bothterms of recall and precision. Increasing λ1 improves the

2Moving vehicles are manually labeled by the Computer Vision AnnotationTool (CVAT), and a boundary box is provided for each moving object on eachframe.

3The estimated foreground is built by contiguous values rather than binaryvalue, so we deploy threshold segmentation as post-processing for extractingthe foreground mask and the moving objects [49].

10−2 1000

20

40

60

80

λ1

F1

10−2 1000

20

40

60

80

λ1

Rec

all

10−2 1000

20

40

60

80

λ1

Prec

isio

n

(a) Varying λ1 with λ2 = 0.025

10−4 10−2 1000

20

40

60

80

λ2

F1

10−4 10−2 1000

20

40

60

80

λ2

Rec

all

10−4 10−2 1000

20

40

60

80

λ2

Prec

isio

n

(b) Varying λ2 with λ1 = 0.0025

Fig. 1. Moving object detection performance with different λ1 and λ2.

detection performance by the increased emphasis on the low-rank subspace modeling. After approaching the best detectionperformance, continuing increasing λ1 would prevent theinformation encoded into the background. As illustrated inFig. 1a, with the fixed λ2 = 0.025 and r = 5, as λ1 increases,the F1 score gradually increases to about 75% from around60%, then the detection performance starts dropping withcontinuously increasing λ1.

Increasing λ2 with fixed λ1 would put more emphasis onthe structured sparsity of the extracted foreground, whichthus improves the precision of the detected moving objectsby restraining the random noises. When λ2 continuouslyincreases, the weight for the structured sparsity norm tendstoo large to encode information into the foreground, whichthen decreases the detection performance. As illustrated inFig. 1b, with fixed λ1 = 0.0025 and r = 5, the F1 scorefirst approaches to the highest point as λ2 increases, then thesame metric drops when λ2 tends too large.

Then we discuss the selection of the dimension of thesubspace r. With fixed λ1 and λ2, the O-LSD with smallerr probably converges faster, however, it may fail in modelingthe permutation of the background for a long video sequence.On the contrary, selecting a higher r would disadvantage theupdating of subspace basis and require more frames beforeO-LSD converges. As presented in Fig. 2, for r ≤ 10, the5-frame F1 scores increase faster than those with r ≥ 25.As the sequence length increases, 5-frame F1 scores by theO-LSD with r ≤ 10 show a trend of dropping. For r = 25,this trend is negligible, and the same metric is still rising forr > 25, which means the O-LSD requires a longer sequenceto converge. The highest F1 score at Frame-200 is achievedby r = 25 in Fig. 2.

In the rest of the paper, based on cross-evaluation, we selectλ1 = 1/

√p and λ1/λ2 = 0.1 with r = 25 for evaluation, and

further fine-tuning on the parameter selection would improvethe detection performance by O-LSD.


0 50 100 150 2000

20

40

60

80

Frame Id

F1

r = 1

r = 2

r = 3

r = 5

r = 10

(a)

0 50 100 150 2000

20

40

60

80

Frame Id

F1

r = 10

r = 25

r = 50

r = 75

r = 100

(b)

Fig. 2. 5-Frame performance evaluation of O-LSD with different r on cross-validation sequence from Video 001.

TABLE IIIDETECTION PERFORMANCE COMPARISON WITH ONLINE ALGORITHMS

Video 001 002 Avg(F1)Recall Precision F1 score Recall Precision F1 scoreGRASTA 76.96% 31.76% 44.97% 72.22% 46.73% 56.75% 50.86%OR-PCA 66.51% 41.50% 51.11% 71.94% 73.79% 72.86% 61.99%GUSOS 59.35% 49.15% 53.77% 68.03% 62.61% 65.21% 59.49%O-LSD 64.99% 63.75% 64.36% 73.00% 90.21% 80.69% 72.48%

B. Comparison with Online ApproachTo verify the effectiveness of O-LSD, detection results by

O-LSD are compared against the state-of-the-art online ap-proaches, GRASTA [31], OR-PCA [32] and GOSUS [34]. Forall methods, the subspace is initialized by the random scheme,and their parameters are selected through cross-validation.

O-LSD boosts the detection performance in terms of theprecision and F1 scores. This improvement should be ownedto the structured sparsity penalty, which suppress the randomnoises in estimated foreground frames. As present in Table. III,the O-LSD achieves the highest precision and F1 scores onboth videos, although there is a little drop of recall score onVideo 001, compared with GRASTA and OR-PCA. In terms ofonline methods employing structured sparsity penalty, O-LSDoutperforms GOSUS on both satellite videos. For commonmoving object detection tasks GOSUS is proved effective,however, it produces no improvement to OR-PCA on thesatellite videos.

Another advantage of O-LSD is its faster convergenceagainst other state-of-the-art online algorithms. As demon-strated in Fig. 3, on both videos, the O-LSD achieves thehigher F1 scores earlier than GRASTA, OR-PCA and GOSUS,implying that O-LSD converges faster by introducing thestructured sparsity penalty. Similar trends can also be observedfrom the perspective of accumulated detection performance,as the accumulated detection performance of O-LSD is alwaysbetter than the three existing methods as the length of sequenceincreases, as illustrated in Fig. 4.

Besides, as more frames are processed by O-LSD algorithm,the cleanness of the estimated background frame graduallyimproves, as shown in Fig. 5.

C. Comparison with Batch-based ApproachBesides the comparison against the state-of-the-art online

algorithms, we also compare O-LSD with the batch-based

TABLE IVDETECTION PERFORMANCE BY O-LSD WITH TEMPORALLY

DOWN-SAMPLING

Video T Recall Precision F1 Time per Frame

001

1 64.99% 63.75% 64.36% 6.57s3 67.85% 59.79% 63.57% 12.74s5 69.22% 62.02% 65.42% 14.62s10 67.72% 62.14% 64.91% 14.26s

002

1 73.00% 90.21% 80.69% 10.75s3 74.21% 88.46% 80.71% 14.48s5 74.41% 87.72% 80.52% 17.86s10 73.46% 88.68% 80.36% 21.12s

algorithms, which are RPCA [50] and LSD [29]. O-LSDachieves slightly descent detection performance with signif-icantly reduced delay in processing on both videos.

Compared with batch-based approaches, O-LSD achievescomparable performance with the batch-based approachRPCA, since it generates less false alarms, as visualized inFig. 6. O-LSD fails in matching the detection performanceby its batch-based counterpart, LSD, and Table. V presents adrop of about 10% in F1 score by O-LSD. This little gap indetection performance between LSD- and O-LSD implies thatO-LSD works pretty well, although it may not converge to theglobal optimum of LSD.

In term of the processing time for each frame, O-LSDsignificantly reduces this metric, compared with LSD. Aspresented in Table. V, the time cost per frame for O-LSD isten times smaller than LSD. Compared with RPCA, O-LSDimproves the detection performance with moderately increasedtime cost per frame. As the detection results by those batch-based approaches are not available until the entire optimizationis completed, O-LSD significantly reduces the delay in movingobject detection by the O-LSD.


0 100 200 300 400 5000

20

40

60

80

Frame Id

F1

O-LSD

GOSUS

OR-PCA

GRASTA

(a) Video 001

0 100 200 300 400 5000

20

40

60

80

Frame Id

F1

O-LSD

GOSUS

OR-PCA

GRASTA

(b) Video 002

Fig. 3. 5-Frame detection performance by different online algorithms.

0 100 200 300 400 5000

20

40

60

Frame Id

F1

O-LSD

GOSUS

OR-PCA

GRASTA

(a) Video 001

0 100 200 300 400 5000

20

40

60

80

Frame Id

F1

O-LSD

GOSUS

OR-PCA

GRASTA

(b) Video 002

Fig. 4. Accumulated detection performance by different online algorithms.

TABLE VDETECTION PERFORMANCE COMPARISON AGAINST BATCH-BASED ALGORITHMS

Video 001 002 Avg(F1)Recall Precision F1 score Time Per Frame Recall Precision F1 score Time Per FrameRPCA 94.57% 40.65% 56.86% 3.16s 90.15% 78.06% 83.67% 5.45s 70.27%LSD 86.80% 70.79% 77.98% 68.48s 82.19% 90.87% 86.31% 119.50s 82.15%

O-LSD 64.99% 63.75% 64.36% 6.57s 73.00% 90.21% 80.69% 10.75s 72.48%

D. Performance Evaluation with Temporally Down-samplingAdditionally, we evaluate the effects of temporally down-

sampling on detection performance. In this set of experiments,one frame from every T frames, T ∈ 1, 3, 5, 10, is fed toO-LSD.

As shown in Table. IV, with increasing T from 1 to 10, theF1 scores fluctuate negligible, which implies that detectionperformance of O-LSD is almost not influenced by the tem-porally down-sampling. At the same time, the time costs foreach frame are more or less the same with different temporallydown-sampling frequencies. Since fewer frames need to beprocessed with large temporally down-sampling scales, thedelay for processing is reduced. However, considering that ittakes more than 1 second to obtain detection results for eachframe by O-LSD, it is still hard to directly apply O-LSD forreal-time applications without appropriate accelerations.

IV. CONCLUSION

The main contribution of this paper is a effective algorithm,Online Low-rank and Structured Sparse Decomposition (O-LSD), which combines the stochastic optimization and struc-tured sparsity penalty to improve online subspace estimation

method for moving object detection in satellite videos. Weelaborate the model of O-LSD and its optimization methodthat is proved to converge almost surely under mild condition.The experiments on a dataset of two satellite videos validatethe improvement of O-LSD to the existing state-of-the-artapproaches. With temporal down-sampling scheme, O-LSDalso reduces the processing delay with almost unchangedperformance.

ACKNOWLEDGMENT

This work is partially supported by China ScholarshipCouncil. The authors would like to thank Planet Team forproviding the data in this research [13].

APPENDIX ATECHNICAL LEMMA

Lemma A.1 (Danskin’s Theorem from [51]). Let C ⊂ Rmbe a compact set. The function `(x,u) : Rn × C → R iscontinuous, and `(·,u) is convex with regards to x for everyu ∈ C. Define `(x) = minu∈C `(x,u) and C(x) = u∗|u∗ =arg min u `(x,u).


Frame-10 Frame-25 Frame-50 Frame-100 Frame-250 Frame-500

Fig. 5. Visualization of the estimated background and detection by O-LSD on Video 001. Images in the 1st row are the original input, and those in the 2ndand 3rd rows are the estimated background frames and detection results, respectively.

Input( Frame-250) Ground Truth RPCA LSD

GRASTA OR-PCA GUSOS O-LSD

Fig. 6. Detection results obtained by different algorithms.

If `(x,u) is differentiable with respect to x for all u ∈ C,and ∂`(x,u)

∂x is continuous with respect to u for all x, then thesub-gradient of `(x) is given by

∂x`(x) = conv∂`(x,u)

∂x|u ∈ C(x) (20)

where conv· indicates the convex hull operator.

Lemma A.2 (Lemma 2.6 from [52]). Let f : X → R be aconvex function. Then, f is L-Lipschitz over X with respectto a norm ‖·‖ if and only if for all ∀w ∈ X and z ∈ ∂f(w)we have that ‖z‖∗ ≤ L, where ‖·‖∗ is the dual norm.

Lemma A.3 (Sufficient condition of convergence for astochastic optimization from [52]). Let (Ω,F , P ) be a mea-surable probability space, µt, for t > 0, be the realizationof a stochastic process and Ft be the filtration by the pastinformation at time t. Let

δt =

1, if E[ut+1 − ut|Ft] > 0,

0, otherwise.(21)

If for all t, µt ≥ 0 and∑∞t=1 E[δt(ut+1−ut)] <∞, then µt

is a quasi-martingale and converges almost surely. Moreover,


∞∑t=1

|E[ut+1 − ut|Ft]| < +∞ a.s. (22)

Lemma A.4 (Corollary of Donsker theorem from [53]). LetF = fθ : X → R, θ ∈ Θ be a set of measurable functionsindexed by a bounded subset Θ of Rd. Suppose that thereexists a constant K such that

|fθ1 − fθ2 | ≤ K ‖θ1 − θ2‖2 , (23)

for every θ1 and θ2 in Θ and x in X . Then F is P-Donsker.For any f in F , let us define Pnf , Pf and Gnf as

Pnf =1

n

n∑i=1

f(Xi),

Pf = E[f(X)],

Gnf =√n(Pnf − Pf).

(24)

Let us also suppose that for all Pf2 ≤ δ2 and ‖f‖∞ ≤M andthat the random variables X1,X2, · · · are Borel-measurable.Then, we have

E |Gnf |F = O(1), (25)

where |Gnf |F = supf∈F |Gnf |.

Lemma A.5 (Positive converging sums from [42]). Let an,bnbe two real sequences such that for all n,an ≥ 0,bn ≥ 0,∑∞n=1 an =∞,

∑∞n=1 anbn <∞, ∃K > 0 s.t. |bn+1 − bn| <

Kan. Then, limn→∞ bn = 0.

APPENDIX BPROPOSITION

Proposition B.1. Assume d ∈ D is uniformly bounded, and(r∗, s∗) is the minimizers of the reconstruction cost functionˆ(d,L, r, s) obtained by Algorithm 2. Then,

1) r∗ and s∗ is uniformly bounded;2) 1

tAt and 1tBt is uniformly bounded;

3) Lt is supported by a compact subset L,

Proof. Given (0,d) is a non-trivial feasible solution to Eq. (7),for the optimal solution (r∗, s∗),

1

2‖d− Lr∗ − s∗‖22 +

λ12‖r∗‖22 + λ2 ‖s∗‖`1/`∞

≤ˆ(d,L,0,d) ≤ λ2 ‖d‖`1/`∞ ,(26)

thus, we obtain that

‖r∗‖22 ≤2λ2λ1‖d‖`1/`∞ .

‖s∗‖`1/`∞ ≤ ‖d‖`1/`∞ .(27)

Based on the assumption that d is uniformly bounded, thenr∗, s∗ is uniformly bounded.

Similarly, we show that the accumulation matrices At andBt are also uniformly bounded, as

1

tAt =

1

t

t∑i=1

rirTi ,

1

tBt =

1

t

t∑i=1

(di − si)riT .

(28)

The closed-from solution Lt is given as

Lt = Bt(At + λ1I)−1

=1

tBt(

1

tAt +

λ1tI)−1.

(29)

in which 1tAt and 1

tBt is uniformly bounded, therefore, Ltis uniformly bounded.

Proposition B.2. Let r, s, Lt be the solution obtained byAlgorithm 1,

1) ˆ(d,L, r, s) and `(d,L) are uniformly bounded;2) The surrogate function gt(L) is uniformly bounded and

Lipschitz.

Proof. The first claim is proved by combining the definitionof ˆ(d,L, r, s) and the uniform boundedness of d, L, r ands.. Similarly, we can show gt(L) is uniformly bounded.

To proof gt(L) is Lipschitz, we show that the gradient ofgt(L) is uniformly bounded as

‖∇gt(L)‖F =

∥∥∥∥L(1

tAt −

1

tI)− 1

tBt

∥∥∥∥F

≤‖L‖F

(∥∥∥∥1

tAt

∥∥∥∥F

+

∥∥∥∥1

tI

∥∥∥∥F

)+

∥∥∥∥1

tBt

∥∥∥∥F(30)

where the terms on the right side of the inequality are uni-formly bounded, ‖∇gt(L)‖F is uniformly bounded. Accordingto Lemma. A.2, gt(L) is convex with respect to L, theboundedness of the gradient implies that gt(L) is Lipschitz.

Proposition B.3. X0(L) refers to the set of all minimizers toˆ(d,L, r, s) as

X0(L) = (r, s)|(r, s) = arg minr,s

ˆ(d,L, r, s). (31)

1) The sub-gradient of function `(d,L) with respect to Lis given as

∂L`(d,L) = conv(Lr∗ + s∗ − d)r∗T

|(r∗, s∗) ∈ X0(L),(32)

where conv· is convex hull operator.2) The subgradient ∂L`(d,L) is uniformly bounded, and

`(d,L) is uniformly Lipschitz.

Proof. To the best of our knowledge, there is no availablenecessary and sufficient condition for uniqueness of minimizerto the reconstruction cost function, which implies that morethan one minimizers of ˆ(d,L, r, s) probably exist.

ˆ(z,L, r, s) is convex and differentiable with respect to L

for every feasible (r∗, s∗). Given ∂ ˆ(z,L,r,s)∂L = (Lr+s−d)rT

is differentiable with respect to (r, s) for all L, according toLemma. A.1, the sub-gradient of `(d,L) is given as

∂L`(d,L) ∈ conv(Lr∗ + s∗ − d)r∗T

|(r∗, s∗) ∈ Z0(L).(33)

Then we proof that (Lr + s− d)rT is uniformly bounded∥∥(Lr + s− d)rT∥∥2≤ ‖r‖2 (‖L‖F ‖r‖2 + ‖s‖2 + ‖d‖2),

(34)


in which every item on the right side of the inequality isuniformly bounded. Therefore (Lr + s − d)rT is uniformlybounded. The convex hull of the bounded set is also bounded,thus the sub-gradient ∂L`(d,L) is bounded.

By Lemma. A.2, `(d,L) is convex with respect to L, and itssub-gradient is is uniformly bounded, thus `(d,L) is uniformlyLipschitz.

Proposition B.4. The empirical cost function ft(L) is uni-formly bounded and Lipschitz.

Proof. By checking the definition of the empirical cost func-tion fn(L), fn(L) is also uniformly bounded.

For gi ∈ ∂L`(di,L),the sub-gradient ‖∂L‖F is uniformlybounded,

‖∂L‖F =

∥∥∥∥∥ 1

n

n∑i=1

gi +λ12n

L

∥∥∥∥∥F

≤ 1

n

n∑i=1

‖gi‖F +λ12n‖L‖F .

(35)Because fn(L) is convex with respect to L, and its sub-gradient ‖∂L‖F is uniformly bounded, fn(L) is Lipschitz.

APPENDIX CPROOF DETAILS

Theorem C.1. Let Lt∞t=1 be the sequence of solution ob-tained by Algorithm 1, the surrogate function gt(Lt) convergesalmost surely.

Proof. In stochastic optimization, the expected cost functionis defined all the samples,

f(L) = Ed[`(d,L)] = limn→∞

fn(L). (36)

gt(Lt) is analyzed as a stochastic positive process, since eachterm in it is non-negative and samples are drawn randomly(independent).

Note ut = gt(Lt), the difference between two consecutivetime instances is given as

ut+1 − ut=gt+1(Lt+1)− gt(Lt)=gt+1(Lt+1)− gt+1(Lt) + gt+1(Lt)− gt(Lt)=gt+1(Lt+1)− gt+1(Lt)

+`(dt+1,Lt)− ft(Lt)

t+ 1+ft(Lt)− gt(Lt)

t+ 1.

(37)

The first two term satisfy gt+1(Lt+1) ≤ gt+1(Lt), andft(Lt)− gt(Lt) ≤ 0, therefore,

ut+1 − ut ≤`(dt+1,Lt)− ft(Lt)

t+ 1

≤`(dt+1,Lt)− 1

t

∑ti=1 `(dt,Lt)

t+ 1

(38)

The expectation conditioned on past information Ft is givenas

E[ut+1 − ut|Ft] ≤E[`(dt+1,Lt)|Ft]− 1

t

∑ti=1 `(dt,Lt)

t+ 1

≤f(Lt)− 1

t

∑ti=1 `(dt,Lt)

t+ 1

≤‖f − ft‖∞t+ 1

,

(39)where we define ft = 1

t

∑ti=1 `(dt,Lt), f = Ed[`(d,Lt)],

and ‖f − ft‖∞ = supf∈F |f − ft|. F = `(d,L) : D →R,L ∈ L defines a set of measurement function indexed byLt from a compact subset L, and P-Donsker. In addition, theboundedness of `(d,L) implies that E[`(d,L)2] is uniformlybounded. Then, the requirements of Lemma. A.4 are allsatisfied such that

E[√t ‖f − ft‖∞] ≤ κ, κ > 0. (40)

Therefore,

E[[E[ut+1 − ut|Ft]]+] =E[max0,E[ut+1 − ut|Ft]]

≤ κ

t32

,(41)

where [·]+ is the positive variation operator.Then we deploy Lemma. A.3 to present the convergence of

gt(x). We define that

δt =

1, if E[ut+1 − ut|Ft] > 0

0, otherwise. (42)

We have∞∑t=1

E[δt(ut+1 − ut)|Ft] =

∞∑t=1

E[[E[ut+1 − ut|Ft]]+]

≤∞∑t=1

κ

t32

< +∞.(43)

Conclusively, gt(Lt) is quasi-martingale and converges almostsure. Moreover,

∞∑t=1

|E[ut+1 − ut|Ft]| < +∞ a.s. (44)

Theorem C.2. For two solutions produced by Algorithm 1 attwo consecutive time instances,

‖Lt − Lt+1‖F = O(1

t). (45)

Proof. The Hessian matrix of gt(L) is H = I⊗ (At + λ1I),where ⊗ is the Kronecker production operator. By the defini-tion of At, At is always semi-positive, therefore, the smallesteigenvalue of H is greater than λ1, which implies that gt(L)is strictly convex (perhaps strongly convex) with respect to Land

gt(Lt+1)− gt(Lt) ≥ λ1 ‖Lt+1 − Lt‖2F (46)


Because Lt+1 minimizes gt+1(L), gt+1(Lt+1) −gt+1(Lt) < 0, which presents that

gt(Lt+1)− gt(Lt)=gt(Lt+1)− gt+1(Lt+1) + gt+1(Lt+1)− gt(Lt)≤gt(Lt+1)− gt+1(Lt+1) + gt+1(Lt)− gt(Lt)≤(gt(Lt+1)− gt+1(Lt+1))− (gt(Lt)− gt+1(Lt))

(47)

We define Gt(L) = gt(L)−gt+1(L), then gradient of Gt(L)is extracted as∇Gt(L) =∇gt(L)−∇gt+1(L)

=1

tLAt +

1

tBt −

1

t+ 1LAt+1 −

1

t+ 1Bt+1

=1

tL(At −

t

t+ 1At+1) +

1

t(Bt −

t

t+ 1At+1),

(48)in which At = At + λ1I.

Given that At, Bt and L is uniformly bounded, the gradient∇Gt(L) is also uniformly bounded,

‖∇Gt(L)‖F

≤1

t(‖L‖F

∥∥∥∥At −t

t+ 1At+1

∥∥∥∥F

+

∥∥∥∥Bt −t

t+ 1At+1

∥∥∥∥F

).

(49)Thus, we conclude that Gt(L) is uniformly Lipschitz withrespect to L, and there exist a positive constant κt =1t (‖L‖F

∥∥∥At − tt+1At+1

∥∥∥F

+∥∥∥Bt − t

t+1At+1

∥∥∥F

) that sat-isfies

Gt(Lt)−Gt(Lt+1) ≤ κt ‖Lt+1 − Lt‖F . (50)

Combining Eqs. (46), (47) and (50), we conclude that

‖Lt+1 − Lt‖F ≤κtλ1. (51)

and ‖Lt+1 − Lt‖F = O( 1t ).

Theorem C.3. Note ft(L) is the empirical cost function, andgt(L) is its surrogate function. Lt is the solution obtained byAlgorithm 1, when t tends to infinity, gt(Lt)−ft(Lt) convergesto 0 almost surely.

Proof. This proof is originally presented by [32], for thecompleteness of this proof, we introduce it here. CombiningEq. (37) with gt+1(Lt+1)− gt+1(Lt) ≤ 0, we obtain

gt(Lt)− ft(Lt)t+ 1

≤`(dt+1,Lt)− ft(Lt)t+ 1

− (gt+1(Lt+1)− gt(Lt))

≤`(dt+1,Lt)− ft(Lt)t+ 1

+ [gt+1(Lt+1)− gt(Lt)]−,

(52)

in which [·]− refers to the negative variation operator.Similar to the proof of Theorem. C.1, we take the expecta-

tion conditioned on past information Ft

E[gt(Lt)− ft(Lt)

t+ 1|Ft] =


≤E[`(dt+1,Lt)− ft(Lt)

t+ 1|Ft]

+ E[[gt+1(Lt+1)− gt(Lt)]−|Ft].(53)

Accumulating gt(Lt)−ft(Lt)t+1 with t tending to ∞, we have

∞∑t=1


≤∞∑t=1

E[`(dt+1,Lt)− 1

t

∑ti=1 `(dt,Lt)

t+ 1|Ft]

+

∞∑t=1

E[[gt+1(Lt+1)− gt(Lt)]−|Ft]

≤∞∑t=1

|f − ft|t+ 1

+

∞∑t=1

E[[gt+1(Lt+1)− gt(Lt)]−|Ft].

(54)

According to central limit theorem,√t|f − ft| converges

almost surely as t tends to infinity. From Theorem. C.1, wealso obtain that

∞∑t=1

|E[[gt+1(Lt+1)− gt(Lt)]−|Ft]| < +∞. (55)

Hence, we have the almost sure convergence of the positivesum

∞∑t=1


≤∞∑t=1

|f − ft|t+ 1

+

∞∑t=1

E[[gt+1(Lt+1)− gt(Lt)]−|Ft]

<∞

(56)

As demonstrated in Propositions. B.2 and B.4, gt(L) andft(L) are both Lipschitz with respect to L, which implies thereexists κ2 such that

|gt+1(Lt+1)− ft+1(Lt+1)− (gt(Lt)− ft(Lt))|= |gt+1(Lt+1)− gt(Lt)− (ft+1(Lt+1)− ft(Lt))|≤ |gt+1(Lt+1)− gt(Lt)|+ |ft+1(Lt+1)− ft(Lt)|≤ |gt+1(Lt+1)− gt+1(Lt)|+ |gt+1(Lt)− gt(Lt)|

+ |ft+1(Lt+1)− ft+1(Lt)|+ |ft+1(Lt)− ft(Lt)|≤κ2 ‖Lt+1 − Lt‖F .

(57)

According to Lemma. A.5, combining with∑∞t=1

1t+1 =∞,

we have that

limt→∞

gt(Lt)− ft(Lt) = 0, a.s.. (58)

Theorem C.4. As t tends to infinity, given the Lt is obtainedby Algorithm 1, gt(Lt)− f(Lt) converges to 0 almost surely.

Proof. From Theorem. C.2, we have ‖f − ft‖∞ converges to0 almost surely, which implies

limt→∞

gt(Lt)− f(Lt) = 0, a.s.. (59)


REFERENCES

[1] G. Cheng and J. Han, “A survey on object detection in optical remotesensing images,” ISPRS Journal of Photogrammetry and Remote Sens-ing, vol. 117, pp. 11–28, 2016.

[2] G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu,M. Pelillo, and L. Zhang, “Dota: A large-scale dataset for objectdetection in aerial images,” in IEEE CVPR, 2018.

[3] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich featurehierarchies for accurate object detection and semantic segmentation,”in Proceedings of the IEEE conference on computer vision and patternrecognition, 2014, pp. 580–587.

[4] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE internationalconference on computer vision, 2015, pp. 1440–1448.

[5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-timeobject detection with region proposal networks,” in Advances in neuralinformation processing systems, 2015, pp. 91–99.

[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only lookonce: Unified, real-time object detection,” in Proceedings of the IEEEconference on computer vision and pattern recognition, 2016, pp. 779–788.

[7] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.Berg, “Ssd: Single shot multibox detector,” in European conference oncomputer vision. Springer, 2016, pp. 21–37.

[8] Y. Long, Y. Gong, Z. Xiao, and Q. Liu, “Accurate object localization inremote sensing images based on convolutional neural networks,” IEEETransactions on Geoscience and Remote Sensing, vol. 55, no. 5, pp.2486–2498, 2017.

[9] K. Li, G. Cheng, S. Bu, and X. You, “Rotation-insensitive and context-augmented object detection in remote sensing images,” IEEE Transac-tions on Geoscience and Remote Sensing, vol. 56, no. 4, pp. 2337–2348,2017.

[10] P. Ding, Y. Zhang, W.-J. Deng, P. Jia, and A. Kuijper, “A light and fasterregional convolutional neural network for object detection in opticalremote sensing images,” ISPRS journal of photogrammetry and remotesensing, vol. 141, pp. 208–218, 2018.

[11] W. Liu, L. Ma, and H. Chen, “Arbitrary-oriented ship detection frame-work in optical remote-sensing images,” IEEE Geoscience and RemoteSensing Letters, vol. 15, no. 6, pp. 937–941, 2018.

[12] Y. Luo, L. Zhou, S. Wang, and Z. Wang, “Video satellite imagery superresolution via convolutional neural networks,” IEEE Geoscience andRemote Sensing Letters, vol. 14, no. 12, pp. 2398–2402, 2017.

[13] P. Team, “Planet application program interface: In space for life on earth.san francisco, ca,” 2016.

[14] L. Mou and X. X. Zhu, “Spatiotemporal scene interpretation of spacevideos via deep neural network and tracklet analysis,” in Geoscienceand Remote Sensing Symposium (IGARSS), 2016 IEEE International.IEEE, 2016, pp. 1823–1826.

[15] B. Du, Y. Sun, S. Cai, C. Wu, and Q. Du, “Object tracking in satellitevideos by fusing the kernel correlation filter and the three-frame-difference algorithm,” IEEE Geoscience and Remote Sensing Letters,vol. 15, no. 2, pp. 168–172, 2018.

[16] J. Zhang, X. Jia, J. Hu, and K. Tan, “Satellite multi-vehicle trackingunder inconsistent detection conditions by bilevel k-shortest paths opti-mization,” in 2018 Digital Image Computing: Techniques and Applica-tions (DICTA). IEEE, 2018, pp. 1–8.

[17] B. Uzkent, A. Rangnekar, and M. J. Hoffman, “Tracking in aerialhyperspectral videos using deep kernelized correlation filters,” IEEETransactions on Geoscience and Remote Sensing, no. 99, pp. 1–13, 2018.

[18] T. Bouwmans and E. H. Zahzah, “Robust pca via principal componentpursuit: A review for a comparative evaluation in video surveillance,”Computer Vision and Image Understanding, vol. 122, pp. 22–34, 2014.

[19] T. Bouwmans, A. Sobral, S. Javed, S. K. Jung, and E.-H. Za-hzah, “Decomposition into low-rank plus additive matrices for back-ground/foreground separation: A review for a comparative evaluationwith a large-scale dataset,” Computer Science Review, vol. 23, pp. 1–71, 2017.

[20] T. Bouwmans, S. Javed, H. Zhang, Z. Lin, and R. Otazo, “On theapplications of robust pca in image and video processing,” Proceedingsof the IEEE, vol. 106, no. 8, pp. 1427–1457, 2018.

[21] E. J. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal componentanalysis?” Journal of the ACM (JACM), vol. 58, no. 3, p. 11, 2011.

[22] Z. Lin, R. Liu, and Z. Su, “Linearized alternating direction method withadaptive penalty for low-rank representation,” in Advances in neuralinformation processing systems, 2011, pp. 612–620.

[23] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma, “Robust principalcomponent analysis: Exact recovery of corrupted low-rank matrices viaconvex optimization,” in Advances in neural information processingsystems, 2009, pp. 2080–2088.

[24] T. Zhou and D. Tao, “Godec: Randomized low-rank & sparse matrixdecomposition in noisy case,” in International conference on machinelearning. Omnipress, 2011.

[25] N. Wang, T. Yao, J. Wang, and D.-Y. Yeung, “A probabilistic approachto robust matrix factorization,” in European Conference on ComputerVision. Springer, 2012, pp. 126–139.

[26] Y. Xu, Z. Wu, J. Chanussot, M. Dalla Mura, A. L. Bertozzi, andZ. Wei, “Low-rank decomposition and total variation regularization ofhyperspectral video sequences,” IEEE Transactions on Geoscience andRemote Sensing, vol. 56, no. 3, pp. 1680–1694, 2017.

[27] X. Zhou, C. Yang, and W. Yu, “Moving object detection by detectingcontiguous outliers in the low-rank representation,” IEEE transactionson pattern analysis and machine intelligence, vol. 35, no. 3, pp. 597–610, 2013.

[28] R. Jenatton, J.-Y. Audibert, and F. Bach, “Structured variable selectionwith sparsity-inducing norms,” Journal of Machine Learning Research,vol. 12, no. Oct, pp. 2777–2824, 2011.

[29] X. Liu, G. Zhao, J. Yao, and C. Qi, “Background subtraction based onlow-rank and structured sparse decomposition,” IEEE Transactions onImage Processing, vol. 24, no. 8, pp. 2502–2514, 2015.

[30] J. Zhang, X. Jia, and J. Hu, “Error bounded foreground and backgroundmodeling for moving object detection in satellite videos,” arXiv preprintarXiv:1908.09539, 2019.

[31] J. He, L. Balzano, and A. Szlam, “Incremental gradient on the grassman-nian for online foreground and background separation in subsampledvideo,” in 2012 IEEE Conference on Computer Vision and PatternRecognition. IEEE, 2012, pp. 1568–1575.

[32] J. Feng, H. Xu, and S. Yan, “Online robust pca via stochastic optimiza-tion,” in Advances in Neural Information Processing Systems, 2013, pp.404–412.

[33] M. Shakeri and H. Zhang, “Corola: a sequential solution to movingobject detection using low-rank approximation,” Computer Vision andImage Understanding, vol. 146, pp. 27–39, 2016.

[34] J. Xu, V. K. Ithapu, L. Mukherjee, J. M. Rehg, and V. Singh, “Gosus:Grassmannian online subspace updates with structured-sparsity,” inProceedings of the IEEE International Conference on Computer Vision,2013, pp. 3376–3383.

[35] S. Javed, A. Mahmood, S. Al-Maadeed, T. Bouwmans, and S. K.Jung, “Moving object detection in complex scene using spatiotempo-ral structured-sparse rpca,” IEEE Transactions on Image Processing,vol. 28, no. 2, pp. 1007–1022, 2019.

[36] P. Turaga, A. Veeraraghavan, and R. Chellappa, “Statistical analysis onstiefel and grassmann manifolds with applications in computer vision,”in 2008 IEEE Conference on Computer Vision and Pattern Recognition.IEEE, 2008, pp. 1–8.

[37] L. Balzano, R. Nowak, and B. Recht, “Online identification and track-ing of subspaces from highly incomplete information,” in 2010 48thAnnual allerton conference on communication, control, and computing(Allerton). IEEE, 2010, pp. 704–711.

[38] M. Harandi, C. Sanderson, C. Shen, and B. C. Lovell, “Dictionary learn-ing and sparse coding on grassmann manifolds: An extrinsic solution,”in Proceedings of the IEEE international conference on computer vision,2013, pp. 3120–3127.

[39] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-ranksolutions of linear matrix equations via nuclear norm minimization,”SIAM review, vol. 52, no. 3, pp. 471–501, 2010.

[40] P. Sprechmann, A. M. Bronstein, and G. Sapiro, “Learning efficientsparse and low rank models,” IEEE transactions on pattern analysisand machine intelligence, vol. 37, no. 9, pp. 1821–1833, 2015.

[41] J. Shen, P. Li, and H. Xu, “Online low-rank subspace clustering by basisdictionary pursuit,” in International Conference on Machine Learning,2016, pp. 622–631.

[42] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for matrixfactorization and sparse coding,” Journal of Machine Learning Research,vol. 11, no. Jan, pp. 19–60, 2010.

[43] E. Dohmatob, A. Mensch, G. Varoquaux, and B. Thirion, “Learningbrain regions via large-scale online structured sparse dictionary learn-ing,” in Advances in Neural Information Processing Systems, 2016, pp.4610–4618.

[44] R. Jenatton, J. Mairal, G. Obozinski, and F. R. Bach, “Proximal methodsfor sparse hierarchical dictionary learning.” in ICML, vol. 1. Citeseer,2010, p. 2.


[45] K. Jia, T.-H. Chan, and Y. Ma, “Robust and practical face recognitionvia structured sparsity,” in European conference on computer vision.Springer, 2012, pp. 331–344.

[46] S. J. Wright, “Coordinate descent algorithms,” Mathematical Program-ming, vol. 151, no. 1, pp. 3–34, 2015.

[47] J. Mairal, R. Jenatton, F. R. Bach, and G. R. Obozinski, “Network flowalgorithms for structured sparsity,” in Advances in Neural InformationProcessing Systems, 2010, pp. 1558–1566.

[48] J. Mairal, R. Jenatton, G. Obozinski, and F. Bach, “Convex and networkflow optimization for structured sparsity,” Journal of Machine LearningResearch, vol. 12, no. Sep, pp. 2681–2720, 2011.

[49] Z. Gao, L.-F. Cheong, and M. Shan, “Block-sparse rpca for consistentforeground detection,” in European Conference on Computer Vision.Springer, 2012, pp. 690–703.

[50] Z. Lin, M. Chen, and Y. Ma, “The augmented lagrange multiplier methodfor exact recovery of corrupted low-rank matrices,” arXiv preprintarXiv:1009.5055, 2010.

[51] D. P. Bertsekas, “Nonlinear programming,” Journal of the OperationalResearch Society, vol. 48, no. 3, pp. 334–334, 1997.

[52] S. Shalev-Shwartz et al., “Online learning and online convex optimiza-tion,” Foundations and Trends R© in Machine Learning, vol. 4, no. 2, pp.107–194, 2012.

[53] A. Bensoussan, J.-L. Lions, and G. Papanicolaou, Asymptotic analysisfor periodic structures. American Mathematical Soc., 2011, vol. 374.

Junpeng Zhang received the B.Sci. degree fromthe China University of Mining and Technology,Xuzhou, China, in 2013 and the Master’s degreein surveying engineering from the same universityin 2016. He is currently pursuing the Ph.D. degreein electrical engineering from The University ofNew South Wales, Australia. His research interestsinclude object detection and tracking in remotesensing imaginary. He was the winner of ”DSTGBest Contribution to Science Award” in Digital Im-age Computing: Techniques and Applications 2018

(DICTA 2018).

Xiuping Jia (M93 –SM03) received the B.Eng.degree from the Beijing University of Posts andTelecommunications, Beijing, China, in 1982 andthe Ph.D. degree in electrical engineering from TheUniversity of New South Wales, Australia, in 1996.Since 1988, she has been with the School of Engi-neering and Information Technology, The Universityof New South Wales at Canberra, Australia, whereshe is currently an Associate Professor. Her researchinterests include remote sensing, image processingand spatial data analysis. Dr. Jia has authored or

coauthored more than 200 referred papers, including over 100 journal paperswith h-index of 34 and i10 of 102. She has co-authored of the remote sensingtextbook titled Remote Sensing Digital Image Analysis [Springer-Verlag, 3rd(1999) and 4th eds. (2006)]. She is a Subject Editor for the Journal of Soilsand Sediments and an Associate Editor of the IEEE TRANSACTIONS ONGEOSCIENCE AND REMOTE SENSING.

Jiankun Hu receive the Ph.D. degree in controlengineering from the Harbin Institute of Technology,China, in 1993, and the masters degree in com-puter science and software engineering from MonashUniversity, Australia, in 2000. He was a ResearchFellow with Delft University, The Netherlands, from1997 to 1998, and The University of Melbourne,Australia, from 1998 to 1999. He is a full professorof Cyber Security at the School of Engineeringand Information Technology, the University of NewSouth Wales at Canberra, Australia. His main re-

search interest is in the field of cyber security, including biometrics security,where he has published many papers in high-quality conferences and journalsincluding the IEEE TRANSACTIONS ON PATTERN ANALYSIS ANDMACHINE INTELLIGENCE. He has served on the editorial boards of upto seven international journals and served as a Security Symposium Chair ofthe IEEE Flagship Conferences of IEEE ICC and IEEE GLOBECOM. Hehas obtained nine Australian Research Council (ARC) Grants. He served atthe prestigious Panel of Mathematics, Information and Computing Sciences,ARC ERA (The Excellence in Research for Australia) Evaluation Committee2012.

Jocelyn Chanussot (M04 –SM04 –F12) receivedthe M.Sc. degree in electrical engineering from theGrenoble Institute of Technology (Grenoble INP),Grenoble, France, in 1995, and the Ph.D. degreefrom the Universit de Savoie, Annecy, France, in1998. In 1999, he was with the Geography ImageryPerception Laboratory for the Delegation Generalede l Armement (French National Defense Depart-ment). Since 1999, he has been with Grenoble INP,where he is currently a Professor of signal andimage processing. He has been a Visiting Scholar

with Stanford University, Stanford, CA, USA; KTH, Stockholm, Sweden;and NUS, Singapore. Since 2013, he has been an Adjunct Professor withthe University of Iceland, Reykjavik, Iceland. From 2015 to 2017, he wasa Visiting Professor with the University of California at Los Angeles, LosAngeles, CA, USA. He is conducting his research at GIPSA-Lab. His researchinterests include image analysis, multicomponent image processing, nonlinearfiltering, and data fusion in remote sensing.

Dr. Chanussot was a member of the IEEE Geoscience and Remote SensingSociety AdCom from 2009 to 2010, in charge of membership developmentand Machine Learning for Signal Processing Technical Committee of the IEEESignal Processing Society from 2006 to 2008. He is a member of the InstitutUniversitaire de France from 2012 to 2017. He was the General Chair of thefirst IEEE GRSS Workshop on Hyperspectral Image and Signal Processing,Evolution in Remote sensing. He was the Chair from 2009 to 2011 and theCo-Chair of the GRS Data Fusion Technical Committee from 2005 to 2008.He was the Program Chair of the IEEE International Workshop on MachineLearning for Signal Processing in 2009. He is the Founding President ofIEEE Geoscience and Remote Sensing French Chapter from 2007 to 2010which received the 2010 IEEE GRSS Chapter Excellence Award. He wasa co-recipient of the NORSIG 2006 Best Student Paper Award, the IEEEGRSS 2011 and 2015 Symposium Best Paper Award, the IEEE GRSS 2012Transactions Prize Paper Award, and the IEEE GRSS 2013 Highest ImpactPaper Award. He was an Associate Editor for IEEE GEOSCIENCE ANDREMOTE SENSING LETTERS from 2005 to 2007 and Pattern Recognitionfrom 2006 to 2008. He was the Editor-in-Chief of the IEEE JOURNALOF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS ANDREMOTE SENSING from 2011 to 2015. Since 2007, he has been an AssociateEditor for the IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTESENSING, and since 2018, he has also been an Associate Editor for theIEEE TRANSACTIONS ON IMAGE PROCESSING. He was the GuestEditor for the PROCEEDINGS OF THE IEEE in 2013 and IEEE SIGNALPROCESSING MAGAZINE in 2014.

submitted to ieee transactions on geoscience and …index terms—satellite video processing, moving...

Documents