quasi-perspective projection model: theory and application to … · 2009. 12. 14. · motion and...

22
Int J Comput Vis DOI 10.1007/s11263-009-0267-4 Quasi-perspective Projection Model: Theory and Application to Structure and Motion Factorization from Uncalibrated Image Sequences Guanghui Wang · Q.M. Jonathan Wu Received: 1 December 2008 / Accepted: 29 June 2009 © Springer Science+Business Media, LLC 2009 Abstract This paper addresses the problem of factorization- based 3D reconstruction from uncalibrated image sequences. Previous studies on structure and motion factorization are either based on simplified affine assumption or general perspective projection. The affine approximation is widely adopted due to its simplicity, whereas the extension to per- spective model suffers from recovering projective depths. To fill the gap between simplicity of affine and accuracy of per- spective model, we propose a quasi-perspective projection model for structure and motion recovery of rigid and non- rigid objects based on factorization framework. The novelty and contribution of this paper are as follows. Firstly, un- der the assumption that the camera is far away from the object with small lateral rotations, we prove that the imag- ing process can be modeled by quasi-perspective projection, which is more accurate than affine model from both geo- metrical error analysis and experimental studies. Secondly, The work is supported in part by Natural Sciences and Engineering Research Council of Canada, and the National Natural Science Foundation of China under Grant No. 60575015. Electronic supplementary material The online version of this article (http://dx.doi.org/10.1007/s11263-009-0267-4) contains supplementary material, which is available to authorized users. G. Wang ( ) · Q.M.J. Wu Department of Electrical and Computer Engineering, University of Windsor, 401 Sunset, Windsor, N9B 3P4, Ontario Canada e-mail: [email protected] Q.M.J. Wu e-mail: [email protected] G. Wang National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080, China we apply the model to establish a framework of rigid and nonrigid factorization under quasi-perspective assumption. Finally, we propose an Extended Cholesky Decomposition to recover the rotation part of the Euclidean upgrading ma- trix. We also prove that the last column of the upgrading matrix corresponds to a global scale and translation of the camera thus may be set freely. The proposed method is vali- dated and evaluated extensively on synthetic and real image sequences and improved results over existing schemes are observed. Keywords Structure from motion · Computational models of vision · Quasi-perspective projection · Imaging geometry · Matrix factorization · Singular value decomposition · Euclidean reconstruction 1 Introduction The problem of structure and motion recovery from im- age sequences is an important theme in computer vision. Great progresses have been made for different applications during the last two decades (Hartley and Zisserman 2004). Among these methods, factorization based approach, for its robust behavior and accuracy, is widely studied since it deals uniformly with the data sets of all images (Poelman and Kanade 1997; Quan 1996; Tomasi and Kanade 1992; Triggs 1996). The factorization algorithm was first proposed by Tomasi and Kanade (1992) in the early 90’s. The main idea of this algorithm is to factorize the tracking matrix into motion and structure matrices simultaneously by singular value decomposition (SVD) with low-rank approximation. The algorithm assumes an orthographic projection model. It was extended to weak perspective and paraperspective pro- jection by Poelman and Kanade (1997). The orthographic,

Upload: others

Post on 28-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Int J Comput VisDOI 10.1007/s11263-009-0267-4

    Quasi-perspective Projection Model: Theory and Applicationto Structure and Motion Factorization from Uncalibrated ImageSequences

    Guanghui Wang · Q.M. Jonathan Wu

    Received: 1 December 2008 / Accepted: 29 June 2009© Springer Science+Business Media, LLC 2009

    Abstract This paper addresses the problem of factorization-based 3D reconstruction from uncalibrated image sequences.Previous studies on structure and motion factorization areeither based on simplified affine assumption or generalperspective projection. The affine approximation is widelyadopted due to its simplicity, whereas the extension to per-spective model suffers from recovering projective depths. Tofill the gap between simplicity of affine and accuracy of per-spective model, we propose a quasi-perspective projectionmodel for structure and motion recovery of rigid and non-rigid objects based on factorization framework. The noveltyand contribution of this paper are as follows. Firstly, un-der the assumption that the camera is far away from theobject with small lateral rotations, we prove that the imag-ing process can be modeled by quasi-perspective projection,which is more accurate than affine model from both geo-metrical error analysis and experimental studies. Secondly,

    The work is supported in part by Natural Sciences and EngineeringResearch Council of Canada, and the National Natural ScienceFoundation of China under Grant No. 60575015.

    Electronic supplementary material The online version of this article(http://dx.doi.org/10.1007/s11263-009-0267-4) containssupplementary material, which is available to authorized users.

    G. Wang (�) · Q.M.J. WuDepartment of Electrical and Computer Engineering, Universityof Windsor, 401 Sunset, Windsor, N9B 3P4, Ontario Canadae-mail: [email protected]

    Q.M.J. Wue-mail: [email protected]

    G. WangNational Laboratory of Pattern Recognition, Instituteof Automation, Chinese Academy of Sciences, Beijing 100080,China

    we apply the model to establish a framework of rigid andnonrigid factorization under quasi-perspective assumption.Finally, we propose an Extended Cholesky Decompositionto recover the rotation part of the Euclidean upgrading ma-trix. We also prove that the last column of the upgradingmatrix corresponds to a global scale and translation of thecamera thus may be set freely. The proposed method is vali-dated and evaluated extensively on synthetic and real imagesequences and improved results over existing schemes areobserved.

    Keywords Structure from motion · Computational modelsof vision · Quasi-perspective projection · Imaginggeometry · Matrix factorization · Singular valuedecomposition · Euclidean reconstruction

    1 Introduction

    The problem of structure and motion recovery from im-age sequences is an important theme in computer vision.Great progresses have been made for different applicationsduring the last two decades (Hartley and Zisserman 2004).Among these methods, factorization based approach, forits robust behavior and accuracy, is widely studied since itdeals uniformly with the data sets of all images (Poelmanand Kanade 1997; Quan 1996; Tomasi and Kanade 1992;Triggs 1996). The factorization algorithm was first proposedby Tomasi and Kanade (1992) in the early 90’s. The mainidea of this algorithm is to factorize the tracking matrix intomotion and structure matrices simultaneously by singularvalue decomposition (SVD) with low-rank approximation.The algorithm assumes an orthographic projection model. Itwas extended to weak perspective and paraperspective pro-jection by Poelman and Kanade (1997). The orthographic,

    http://dx.doi.org/10.1007/s11263-009-0267-4mailto:[email protected]:[email protected]

  • Int J Comput Vis

    weak perspective, and paraperspective projections can begeneralized as affine camera model.

    More generally, Christy and Horaud (1996) extended theabove methods to a perspective camera model by incremen-tally performing the factorization under affine assumption.The method is an affine approximation to full perspectiveprojection. Triggs (1996) and Sturm and Triggs (1996) pro-posed a full projective reconstruction method via rank-4 fac-torization of a scaled tracking matrix with projective depthsrecovered from pairwise epipolar geometry. The methodwas further studied in Han and Kanade (2000), Heyden et al.(1999), Mahamud and Hebert (2000), where different itera-tive schemes were proposed to recover the projective depthsthrough minimizing reprojection errors. Recently, Oliensisand Hartley (2007) provided a complete theoretical conver-gence analysis for the iterative extensions. Unfortunately, noiteration has been shown to converge sensibly, and they pro-posed a simple extension, called CIESTA, to give a reliableinitialization to other algorithms.

    The above methods work only for rigid objects and staticscenes. Whereas in real world, many scenarios are nonrigidor dynamic such as articulated motion, human faces carry-ing different expressions, lip movements, hand gesture, andmoving vehicles etc. In order to deal with such situations,many extensions stemming from the factorization algorithmwere proposed to relax the rigidity constraint. Costeira andKanade (1998) first discussed how to recover the motion andshape of several independent moving objects via factoriza-tion using orthographic projection. Bascle and Blake (1998)proposed a method for factorizing facial expressions andposes based on a set of preselected basis images. Recently,Li et al. (2007) proposed to segment multiple rigid-body mo-tions from point correspondences via subspace separation.Yan and Pollefeys (2005, 2008) proposed a factorization-based approach to recover the structure and kinematic chainof articulated objects.

    In the pioneer work by Bregler et al. (2000), it is demon-strated that the 3D shape of a nonrigid object can be ex-pressed as a weighted linear combination of a set of shapebases. Then the shape bases and camera motions are fac-torized simultaneously for all time instants under the rankconstraint of the tracking matrix. Following this idea, themethod was extensively investigated and developed by manyresearchers, such as Brand (2001, 2005), Del Bue et al.(2006, 2004), Torresani et al. (2008, 2001), and Xiao et al.(2006), Xiao and Kanade (2005). Recently, Rabaud and Be-longie (2008) relaxed the Bregler’s assumption (2000) byassuming that only small neighborhoods of shapes are well-modeled by a linear subspace, and proposed a novel ap-proach to solve the problem by adopting a manifold-learningframework.

    Most nonrigid factorization methods are based on affinecamera model due to its simplicity. It was extended to per-spective projection in Xiao and Kanade (2005) by iteratively

    recovering the projective depths. The perspective factoriza-tion is more complicated and does not guarantee its con-vergence to the correct depths, especially for nonrigid sce-narios (Hartley and Zisserman 2004). Vidal and Abretske(2006) proposed that the constraints among multiple viewsof a nonrigid shape consisting of k shape bases can be re-duced to multilinear constraints. They presented a closedform solution to the reconstruction of a nonrigid shape con-sisting of two shape bases. Hartley and Vidal (2008) pro-posed a closed form solution to the nonrigid shape and mo-tion with calibrated cameras or fixed intrinsic parameters.Since the factorization is only defined up to a nonsingulartransformation matrix, many researchers adopt the metricconstraints to recover the matrix and upgrade the factoriza-tion to the Euclidean space (Brand 2001; Bregler et al. 2000;Del Bue et al. 2004; Torresani et al. 2001). However, the ro-tation constraint may cause ambiguity in the combination ofshape bases. Xiao et al. (2006) proposed a basis constraintto solve the ambiguity and provided a closed-form solution.

    The essence of the factorization algorithm is to find alow-rank approximation of the tracking matrix. Most al-gorithms adopt SVD to compute the approximation. Alter-natively, Hartley and Schaffalizky (2003) proposed to usepower factorization (PF) to find the low-rank approxima-tion, which can handle missing data in a tracking matrix. Itwas extended to nonrigid factorization in both metric space(Wang et al. 2008) and affine space (Wang and Wu 2008a).Vidal et al. (2008) proposed to combine the PF algorithmfor motion segmentation. There are some other nonlinearbased studies to deal with incomplete tracking matrix withsome entries unavailable, such as Damped Newton method(Buchanan and Fitzgibbon 2005) and Levenberg-Marquardtbased method (Chen 2008). Torresani et al. (2008) pro-posed a Probabilistic Principal Components Analysis algo-rithm to estimate the 3D shape and motion with missingdata. Camera calibration is an indispensable step in retriev-ing 3D metric information from 2D images. Many self-calibration algorithms were proposed to calibrate fixed cam-era parameters (Maybank and Faugeras 1992; Hartley 1997;Luong and Faugeras 1997), varying parameters (Heyden andÅström 1997; Pollefeys et al. 1999), and affine camera mod-els (Quan 1996).

    Previous studies on factorization are either based onaffine camera model or perspective projection. The affine as-sumption is widely adopted due to its simplicity although itis just an approximation to real imaging process. Whereasthe extension to perspective model suffers from recoveryof the projective depths, which is computationally intensiveand no convergence is guaranteed. In this paper, we try tomake a trade-off between the simplicity of affine and ac-curacy of full perspective projection and propose a novelframework for the problem. Assuming that the camera is faraway from the object with small lateral rotations, which is

  • Int J Comput Vis

    similar to affine assumption and is easily satisfied in prac-tice, we propose a quasi-perspective projection model andgive an error analysis of different projection models. Themodel is proved to be more accurate than affine approxima-tion since the projective depths are implicitly embedded inthe shape matrix, but its computational complexity is similarto affine. We apply this model to the factorization algorithmand establish a framework of rigid and nonrigid factorizationunder quasi-perspective projection. We elaborate the com-putational details on recovery of the Euclidean upgradingmatrix. To the best of our knowledge, there is no similar re-port in literature. The idea was first proposed in CVPR 2008(Wang and Wu 2008b) and we will present more theoreticalanalysis and experimental evaluations in the paper.

    The remaining part of the paper is organized as follows.The definition and background on the factorization algo-rithm is given in Sect. 2. The proposed quasi-perspectivemodel and error analysis are elaborated in Sect. 3. The ap-plication to rigid factorization under the proposed model isdetailed in Sect. 4. The quasi-perspective nonrigid factoriza-tion is presented in Sect. 5. Extensive experimental evalua-tions on synthetic data are given in Sect. 6. Some test resultson real image sequences are reported in Sect. 7. Finally, theconcluding remarks are presented in Sect. 8.

    2 Background on Factorization

    2.1 Problem Definition

    Under perspective projection, a 3D point Xj is projectedonto an image point xij in frame i according to equation

    λij xij = PiXj = Ki[Ri ,Ti]Xj (1)where λij is a non-zero scale factor, commonly called pro-jective depth; xij = [uij , vij ,1]T and Xj = [xj , yj , zj ,1]Tare expressed in homogeneous form; Pi is the projection ma-trix of the i-th frame; Ri and Ti are the corresponding rota-tion matrix and translation vector of the camera with respectto the world system; Ki is the camera calibration matrix inform of

    Ki =⎡⎣

    fi ςi u0i0 κifi v0i0 0 1

    ⎤⎦ (2)

    where fi represents the camera’s focal length; [u0i , v0i]T isthe coordinates of the camera’s principal point; ςi refers tothe skew factor; κi is called aspect ratio of the camera. Forsome precise industrial CCD cameras, we may assume zeroskew, known principal point, and unit aspect ratio i.e., ςi =0, u0i = v0i = 0, and κi = 1. Then the camera is simplifiedto have only one intrinsic parameter.

    When the distance of an object from a camera is muchgreater than the depth variation of the object, we may as-sume affine camera model. Under affine assumption, the lastrow of the projection matrix is of the form PT3i � [0,0,0,1],where ‘�’ denotes equality up to scale. Then the projectionprocess (1) can be simplified by removing the scale factorλij .

    x̄ij = AiX̄j + T̄i (3)where, Ai ∈ R2×3 is composed by the upper-left 2 × 3 sub-matrix of Pi ; x̄ij = [uij , vij ]T and X̄j = [xj , yj , zj ]T arethe non-homogeneous form of xij and Xj , respectively; T̄iis the corresponding translation vector, which is actually theimage of world origin. Under affine projection, it is easy toverify that the centroid of a set of space points is projected tothe centroid of their images. Therefore, the translation termvanishes if all the image points in each frame are registeredto the corresponding centroid, and the projection is simpli-fied to the form

    x̄ij = AiX̄j (4)The problem of structure from motion is defined as:

    Given n tracked feature points of an object across a sequenceof m frames {xij |i = 1, . . . ,m, j = 1, . . . , n}. We want torecover the structure Sij = {Xij |i = 1, . . . ,m, j = 1, . . . , n}and motion {Ri ,Ti} of the object. The factorization basedalgorithm is proved to be an effective method to deal withthis problem. As shown in Table 1, the algorithms can gen-erally be classified into following four categories accordingto the camera assumption and object property. (i) Rigid ob-ject under affine assumption; (ii) rigid object under perspec-tive projection; (iii) nonrigid object under affine assump-tion; (iv) nonrigid object under perspective projection. In Ta-ble 1, ‘Quasi-persp’ stands for quasi-perspective projectionmodel to be discussed in the paper. The meaning of symbolsW,M,S,B, and H in the table is defined in the followingsubsections.

    2.2 Rigid Factorization

    Under affine assumption (4), the projection from space tothe sequence is expressed as

    ⎡⎢⎣

    x̄11 · · · x̄1n...

    . . ....

    x̄m1 · · · x̄mn

    ⎤⎥⎦

    ︸ ︷︷ ︸W2m×n

    =⎡⎢⎣

    A1...

    Am

    ⎤⎥⎦

    ︸ ︷︷ ︸M2m×3

    [X̄1, . . . , X̄n

    ]︸ ︷︷ ︸

    S̄3×n

    (5)

    where W is called tracking matrix; M and S̄ are called mo-tion matrix and shape matrix respectively. It is evident thatthe rank of the tracking matrix is at most 3, and the rank

  • Int J Comput Vis

    Table 1 Classification ofstructure and motionfactorization of rigid andnonrigid objects

    Classification Tracking matrix Motion matrix Shape matrix Upgrading matrix

    Rigid Affine W ∈ R2m×n M ∈ R2m×3 S̄ ∈ R3×n H ∈ R3×3Perspective Ẇ ∈ R3m×n M ∈ R3m×4 S ∈ R4×n H ∈ R4×4Quasi-Persp W ∈ R3m×n M ∈ R3m×4 S ∈ R4×n H ∈ R4×4

    Nonrigid Affine W ∈ R2m×n M ∈ R2m×3k B̄ ∈ R3k×n H ∈ R3k×3kPerspective Ẇ ∈ R3m×n M ∈ R3m×(3k+1) B ∈ R(3k+1)×n H ∈ R(3k+1)×(3k+1)Quasi-Persp W ∈ R3m×n M ∈ R3m×(3k+1) B ∈ R(3k+1)×n H ∈ R(3k+1)×(3k+1)

    constraint can be easily imposed by performing SVD de-composition on the tracking matrix W and truncating it torank 3. However, the decomposition is not unique since it isonly defined up to a nonsingular linear transformation ma-trix H ∈ R3×3 as W = (MH)(H−1S̄). Actually, the decom-position is just one of the affine reconstructions of an ob-ject. By inserting H into the factorization, we can upgradethe reconstruction from affine to the Euclidean space. Wewill alternatively name the matrix as (Euclidean) upgradingmatrix in the following. Many researchers utilize the met-ric constraints of the motion matrix to recover the matrix(Poelman and Kanade 1997; Quan 1996), which is indeeda self-calibration process under the constraints of simplifiedcamera parameters.

    When the perspective projection model (1) is adopted, thefactorization equation can be formulated as

    ⎡⎢⎣

    λ11x11 · · · λ1nx1n...

    . . ....

    λm1xm1 · · · λmnxmn

    ⎤⎥⎦

    ︸ ︷︷ ︸Ẇ3m×n

    =⎡⎢⎣

    P1...

    Pm

    ⎤⎥⎦

    ︸ ︷︷ ︸M3m×4

    [X̄1, . . . , X̄n1, . . . , 1

    ]

    ︸ ︷︷ ︸S4×n

    (6)

    where Ẇ is called projective depths scaled tracking matrix,and its rank is at most 4 if a consistent set of scalars λij arepresent; M and S are the camera matrix and homogeneousshape matrix respectively. Obviously, any such factorizationcorresponds to a valid projective reconstruction which is de-fined up to a projective transformation matrix H ∈ R4×4. Wecan still use the metric constraint to recover the upgradingmatrix.

    The most difficult part for perspective factorization is torecover the projective depths that are consistent with (1).One method is to estimate the depths pairwisely from thefundamental matrix and then string them together (Sturmand Triggs 1996; Triggs 1996). The disadvantage of such ap-proach is the computational cost and possible error accumu-lation. Another method is to start with initial depths λij = 1,and iteratively refine the depths by reprojections (Han andKanade 2000; Hartley and Zisserman 2004; Mahamud and

    Hebert 2000). However, there is no guarantee that the proce-dure will converge to a global minimum, as recently provedin Oliensis and Hartley (2007), no iteration has been shownto converge sensibly.

    2.3 Nonrigid Factorization

    When an object is nonrigid, many studies follow the Bre-gler’s assumption (Bregler et al. 2000) that the nonrigidstructure can be approximated by a linearly weighted com-bination of k rigid shape bases.

    S̄i =∑k

    l=1 ωilBl (7)

    where Bl ∈ R3×n is a shape base that embodies the princi-pal mode of the deformation, ωil ∈ R is called deformationweight. Under this assumption and affine camera model, thenonrigid factorization is modeled as

    ⎡⎢⎣

    x̄11 · · · x̄1n...

    . . ....

    x̄m1 · · · x̄mn

    ⎤⎥⎦

    ︸ ︷︷ ︸W2m×n

    =⎡⎢⎣

    ω11A1 · · · ω1kA1...

    . . ....

    ωm1Am · · · ωmkAm

    ⎤⎥⎦

    ︸ ︷︷ ︸M2m×3k

    ⎡⎢⎣

    B1...

    Bk

    ⎤⎥⎦

    ︸ ︷︷ ︸B̄3k×n

    (8)

    We call M nonrigid motion matrix, B̄ nonrigid shape ma-trix which is composed of the k shape bases. It is easy to seefrom (8) that the rank of the nonrigid tracking matrix W is atmost 3k. The decomposition can be achieved by SVD withthe rank-3k constraint, which is defined up to a nonsingularupgrading matrix H ∈ R3k×3k . If the matrix is known, Ai ,ωil and S̄i can be recovered accordingly from M and B̄. Thecomputation of H here is more complicated than in rigidcase. Many researchers (Brand 2001; Del Bue et al. 2004;Torresani et al. 2001) adopted the metric constraints of themotion matrix. However, the constraints may be insufficientwhen the object deforms at varying speed. Xiao et al. (2006)proposed a basis constraint to solve such ambiguity.

  • Int J Comput Vis

    Similarly, the factorization under perspective projectioncan be formulated as follows (Xiao and Kanade 2005).

    Ẇ3m×n =

    ⎡⎢⎢⎣

    ω11P(1:3)1 · · · ω1kP(1:3)1 P(4)1

    .... . .

    ......

    ωm1P(1:3)m · · · ωmkP(1:3)m P(4)m

    ⎤⎥⎥⎦

    ︸ ︷︷ ︸M3m×(3k+1)

    ⎡⎢⎢⎢⎣

    B1...

    Bk1T

    ⎤⎥⎥⎥⎦

    ︸ ︷︷ ︸B(3k+1)×n

    (9)

    where Ẇ is the depths-scaled tracking matrix as in (6); P(1:3)iand P(4)i denote the first three and the fourth columns of Pi ,respectively; 1 = [1, . . . ,1]T is an n vector with unit entities.The rank of the correctly scaled tracking matrix is at most3k + 1. The decomposition is defined up to a transformationH ∈ R(3k+1)×(3k+1), which can be determined in a similarbut more complicated way. Just as in rigid case, the mostdifficult part for nonrigid perspective factorization is to de-termine the projective depths. Since there is no pairwise fun-damental matrix for deformable features, we can only usethe iterative method to recover the depth, although it is morelikely to be stuck in a local minimum in nonrigid situation.

    3 Quasi-perspective Projection

    In this section, we will propose a new quasi-perspective pro-jection model to fill the gap between simplicity of affinecamera and accuracy of perspective projection.

    3.1 Quasi-perspective Projection

    Under perspective projection, the image formation processis shown in Fig. 1. In order to ensure large overlappingpart of the object to be reconstructed, the camera usuallyundergoes really small movements across adjacent views,

    especially for images of a video sequence. Suppose Ow −XwYwZw is the world coordinate system selected on the ob-ject to be reconstructed. Oi − XiYiZi is the camera systemwith Oi being the optical center of the camera. Without lossof generality, we assume there is a reference camera sys-tem Or − XrYrZr . As the world system can be set freely,we align it with the reference frame as illustrated in Fig. 1.Therefore, the rotation Ri of frame i with respect to the ref-erence frame is the same as the rotation of the camera to theworld system.

    Definition 1 (Axial and lateral rotation) The orientation ofa camera is usually described by roll-pitch-yaw angles. Forthe i-th frame, we define the pitch, yaw, and roll as the rota-tions αi,βi , and γi of the camera with respect to the Xw,Yw ,and Zw axes of the world system. As shown in Fig. 1, theoptical axis of the cameras usually point towards the object.For convenience of discussion, we define γi as the axial ro-tation angle, and define αi,βi as lateral rotation angles.

    Proposition 2 Suppose the camera undergoes small lateralrotation with respect to the reference frame, then the vari-ation of projective depth λij is mainly proportional to thedepth of the space point, and the projective depths of a pointat different views have similar trend of variation.

    Proof Suppose the rotation and translation of the i-th frameto the world system are Ri = [r1i , r2i , r3i]T and Ti =[txi , tyi , tzi]T , respectively. Then the projection matrix canbe written as

    Pi = Ki[Ri ,Ti]

    =⎡⎣

    firT1i + ςirT2i + u0irT3i fi txi + ςityi + u0i tziκifirT2i + v0irT3i κifi tyi + v0i tzi

    rT3i tzi

    ⎤⎦ (10)

    Let us decompose the rotation matrix into the rotationsaround three axes as R(γi)R(βi)R(αi). Then we have

    Ri = R(γi)R(βi)R(αi)

    =⎡⎣

    Cγi −Sγi 0Sγi Cγi 0

    0 0 1

    ⎤⎦

    ⎡⎣

    Cβi 0 Sβi0 1 0

    −Sβi 0 Cβi

    ⎤⎦

    ⎡⎣

    1 0 00 Cαi −Sαi0 Sαi Cαi

    ⎤⎦ =

    ⎡⎣

    Cγi Cβi Cγi Sβi Sαi − Sγi Cαi Cγi Sβi Cαi + Sγi SαiSγi Cβi Sγi Sβi Sαi + Cγi Cαi Sγi Sβi Cαi − Cγi Sαi−Sβi Cβi Sαi Cβi Cαi

    ⎤⎦

    (11)

  • Int J Comput Vis

    Fig. 1 Imaging process of an object. (a) Camera setup with respect to the object. (b) The relationship of world coordinate system and camerasystem at different viewpoint

    where ‘S ’ stands for sine function, and ‘C ’ stands for cosinefunction. By inserting (10) and (11) into (1), we have

    λij = [rT3i , tzi]Xj= −(Sβi)xj + (Cβi Sαi)yj + (Cβi Cαi)zj + tzi (12)

    From Fig. 1, we know that the rotation angles αi,βi, γiof the camera to the world system are the same as those tothe reference frame. Under small lateral rotations, i.e., smallangles of αi and βi , we have Sβi � Cβi Cαi and Cβi Sαi �Cβi Cαi . Thus (12) can be approximated by

    λij ≈ (Cβi Cαi)zj + tzi (13)

    All features {xij |j = 1, . . . , n} in the i-th frame corre-spond to the same rotation angles αi,βi, γi and translationtzi . It is evident from (13) that the projective depths of apoint in all frames have similar trend of variation, which arein proportion to the value of zj of the space point. Actually,the projective depths have nothing related with the axial ro-tation γi . �

    Proposition 3 Under small lateral rotations and further as-sumption that the distance of the camera to an object isgreatly larger than the depth of the object, i.e., tzi � zj , thenthe ratio of {λij |i = 1, . . . ,m} corresponding to any two dif-ferent frames can be approximated by a constant.

    Proof Let us take the reference frame as an example, theratio of the projective depths of any frame i to those of thereference frame can be written as

    μi = λrjλij

    ≈ (Cβr Cαr)zj + tzr(Cβi Cαi)zj + tzi

    = Cβr Cαr(zj /tzi) + tzr/tziCβi Cαi(zj /tzi) + 1 (14)

    where Cβi Cαi ≤ 1. Under the assumption of tzi � zj , theratio can be approximated by

    μi = λrjλij

    ≈ tzrtzi

    (15)

    All features in a frame have the same translation term.Thus from (15) we can see that the projective depth ratiosof two frames for all features have the same approxima-tion μi . �

    According to Proposition 3, we have λij = 1μi λrj . Thusthe perspective projection equation (1) can be approximatedby

    1

    μiλrj xij = PiXj (16)

    Let us denote λrj as 1j , and reformulate (16) as

    xij = PqiXqj (17)

    where

    Pqi = μiPi , Xqj = j Xj (18)

    We call (17) as quasi-perspective projection model.Compared with general perspective projection, the quasi-perspective assumes that projective depths between differentframes are defined up to a constant μi . Thus the projectivedepths are implicitly embedded in the scalars of the homo-geneous structure Xqj and the projection matrix Pqi , andthe difficult problem of estimating the unknown depths isavoided. The model is more general than affine projectionmodel (3), where all projective depths are simply assumedto be equal.

  • Int J Comput Vis

    3.2 Error Analysis of Different Projection Model

    In this subsection, we will give a heuristic analysis on theimaging errors of quasi-perspective and affine camera mod-els with respect to the general perspective projection. Forsimplicity, the subscript ‘i’ of the frame number is omit-ted in the following. Suppose the intrinsic parameters of thecameras are known, and all images are normalized by ap-plying the inverse K−1i to each frame. Then the projectionmatrices under different projection model can be written as

    P =⎡⎢⎣

    rT1 txrT2 tyrT3 tz

    ⎤⎥⎦ , rT3 = [−Sβ, CβSα, CβCα], (19)

    Pq =⎡⎢⎣

    rT1 txrT2 tyrT3q tz

    ⎤⎥⎦ , rT3q = [0,0, CβCα], (20)

    Pa =⎡⎢⎣

    rT1 txrT2 ty0T tz

    ⎤⎥⎦ , 0T = [0,0,0] (21)

    where P is the projection matrix of perspective projection,Pq is that of quasi-perspective assumption, and Pa is that ofaffine projection. It is clear that the main difference of theseprojection matrices only lies in last row. For a space pointX̄ = [x, y, z]T , its projection under different camera modelsis given by

    m = P[

    X̄1

    ]=

    ⎡⎣

    u

    v

    rT3 X̄ + tz

    ⎤⎦ , (22)

    mq = Pq[

    X̄1

    ]=

    ⎡⎣

    u

    v

    rT3qX̄ + tz

    ⎤⎦ , (23)

    ma = Pa[

    X̄1

    ]=

    ⎡⎣

    u

    v

    tz

    ⎤⎦ (24)

    where

    u = rT1 X̄ + tx, v = rT2 X̄ + ty, (25)rT3 X̄ = −(Sβ)x + (CβSα)y + (CβCα)z, (26)rT3qX̄ = (CβCα)z (27)and the nonhomogeneous image points can be denoted as

    m̄ = 1rT3 X̄ + tz

    [u

    v

    ], (28)

    m̄q = 1rT3qX̄ + tz

    [u

    v

    ], (29)

    m̄a = 1tz

    [u

    v

    ](30)

    The point m̄ is an ideal image of perspective projec-tion. Let us define eq = |m̄q − m̄| as the error of quasi-perspective, and ea = |m̄a − m̄| as the error of affine, where‘| · |’ stands for the norm of a vector. Then we have

    eq = |m̄q − m̄|

    =∣∣∣∣

    rT3 X̄ + tzrT3qX̄ + tz

    m̄ − m̄∣∣∣∣ = det

    ((rT3 − rT3q)X̄

    rT3qX̄ + tz

    )|m̄|

    = det(−(Sβ)x + (CβSα)y

    (CβCα)z + tz)

    |m̄|, (31)

    ea = |m̄a − m̄|

    =∣∣∣∣rT3 X̄ + tz

    tzm̄ − m̄

    ∣∣∣∣ = det(

    rT3 X̄

    tz

    )|m̄|

    = det(−(Sβ)x + (CβSα)y + (CβCα)z

    tz

    )|m̄| (32)

    Based on above equations, it is rational to state the fol-lowing results for different projection models.

    1. The axial rotation angle γ around Z-axis has no influenceon the images of m̄, m̄q and m̄a .

    2. When the distance of a camera to an object is much largerthan the object depth, both m̄q and m̄a are close to m̄.

    3. When the camera system is aligned with the world sys-tem, i.e., α = β = 0, we have rT3q = rT3 = [0,0,1] andeq = 0. Thus m̄q = m̄, and the quasi-perspective assump-tion is equivalent to perspective projection.

    4. When the rotation angles α and β are small, we haveeq < ea , i.e., the quasi-perspective assumption is moreaccurate than affine assumption.

    5. When the space point lies on the plane through the worldorigin and perpendicular to the principal axis, i.e., the di-rection of rT3 , we have α = β = 0 and z = 0. It is easy toverify that m̄ = m̄q = m̄a .

    4 Quasi-Perspective Rigid Factorization

    Under quasi-perspective projection (17), the factorizationequation of a tracking matrix is expressed as

    ⎡⎢⎣

    x11 · · · x1n...

    . . ....

    xm1 · · · xmn

    ⎤⎥⎦ =

    ⎡⎢⎣

    μ1P1...

    μmPm

    ⎤⎥⎦ [1X1, . . . , nXn] (33)

    which can be written concisely as

    W3m×n = M3m×4S4×n (34)

  • Int J Comput Vis

    The form is similar to perspective factorization (6). How-ever, the projective depths in (33) are embedded in the mo-tion and shape matrices, hence there is no need to estimatethem explicitly. By performing SVD on the tracking matrixand imposing the rank-4 constraint, W may be factorizedas M̂3m×4Ŝ4×n. However, the decomposition is not uniquesince it is defined up to a nonsingular linear transformationH4×4 as M = M̂H and S = H−1Ŝ. If a reasonable upgrad-ing matrix is recovered, the Euclidean structure and motionscan be easily recovered from the shape matrix S and motionmatrix M. Due to the special form of (33), the recovery ofan upgrading matrix has some special properties comparedwith those under affine and perspective projection. We willshow the computation details later in the article.

    4.1 Recovery of the Euclidean Upgrading Matrix

    We adopt the metric constraint to compute an upgrading ma-trix H4×4. Let us represent the matrix into two parts as

    H = [Hl |Hr ] (35)where Hl denotes the first three columns of H, Hr denotesthe fourth column. Suppose M̂i is the i-th triple rows of M̂,then from M̂iH = [M̂iHl |M̂iHr ], we know thatM̂iHl = μiP(1:3)i = μiKiRi , (36)M̂iHr = μiP(4)i = μiKiTi (37)

    Let us denote Ci = M̂iQM̂Ti , where Q = HlHTl is a 4×4symmetric matrix. As in previous factorization studies (Hanand Kanade 2000; Quan 1996), we adopt a simplified cam-era model with only one parameter as Ki = diag(fi, fi,1).Then from

    Ci = M̂iQM̂Ti = (μiKiRi )(μiKiRi )T

    = μ2i KiKTi = μ2i⎡⎣

    f 2if 2i

    1

    ⎤⎦ (38)

    we can obtain the following constraints.⎧⎪⎪⎨⎪⎪⎩

    Ci (1,2) = Ci (2,1) = 0Ci (1,3) = Ci (3,1) = 0Ci (2,3) = Ci (3,2) = 0Ci (1,1) − Ci (2,2) = 0

    (39)

    Since the factorization (33) can be defined up to a globalscalar as W = MS = (εM)(S/ε), we may set μ1 = 1 toavoid the trivial solution of Q = 0. Thus we have 4m + 1linear constraints in total on the 10 unknowns of Q, whichcan be solved via least squares. Ideally, Q is a positive semi-definite symmetric matrix, the matrix Hl can be recoveredfrom Q via matrix decomposition.

    Definition 4 (Vertical extended upper triangular matrix)Suppose U is a n × k (n > k) matrix. We call U a verticalextended upper triangular matrix if it is of the form

    Uij ={

    uij if i ≤ j + (n − k)0 if i > j + (n − k) (40)

    where Uij denotes the (i, j)-th element of U, and uij is ascalar. For example, a n × (n − 1) vertical extended uppertriangular matrix can be written explicitly as

    U =

    ⎡⎢⎢⎢⎢⎢⎣

    u11 u12 · · · u1(n−1)u21 u22 · · · u2(n−1)

    u32 · · · u3(n−1). . .

    ...

    un(n−1)

    ⎤⎥⎥⎥⎥⎥⎦

    (41)

    Proposition 5 (Extended Cholesky Decomposition) Sup-pose Qn is a n×n positive semidefinite symmetric matrix ofrank k. Then it can be decomposed as Qn = HkHTk , whereHk is a n× k matrix of rank k. Furthermore, the decomposi-tion can be written as Qn = �k�Tk with �k a n × k verticalextended upper triangular matrix. The degree-of-freedom ofthe matrix Q is nk − 12k(k − 1), which is the number of un-knowns in �k .

    The proof of Proposition 5 is given in Appendix 1. Formthe Extended Cholesky Decomposition we can easily obtainthe following result.

    Result 6 The matrix Q recovered from (39) is a 4 × 4 pos-itive semidefinite symmetric matrix of rank 3. It can be de-composed as Q = HlHTl , where Hl is a 4 × 3 rank 3 matrix.The decomposition can be further written as Q = �3�T3with �3 a 4 × 3 vertical extended upper triangular matrix.

    The computation of Hl is very simple. Suppose the SVDdecomposition of Q is U4�4UT4 , where U4 is a 4×4 orthog-onal matrix, �4 = diag(σ1, σ2, σ3,0) is a diagonal matrixwith σi the singular value of Q. Thus we can immediatelyhave

    Hl = U(1:3)⎡⎣

    √σ1 √

    σ2 √σ3

    ⎤⎦ (42)

    where U(1:3) denotes the first three columns of U. Then thevertical extended upper triangular matrix �3 can be con-structed from Hl as shown in Appendix 1. The computa-tion is an extension of Cholesky Decomposition to the caseof positive semidefinite symmetric matrix, while generalCholesky Decomposition can only be applied to positivedefinite symmetric matrix. From the number of unknownsin �3 we know that Q is only defined up to 9 degree-of-freedom.

  • Int J Comput Vis

    Remark 7 In Result 6, we assume Q is positive semidefinite.However, the recovered matrix Q may be negative definite incase of noisy data, thus we can not adopt the above methodto decompose it into the form of HlHTl or �3�

    T3 . In this

    case, let us denote

    �3 =

    ⎡⎢⎢⎣

    h1 h2 h3h4 h5 h6

    h7 h8h9

    ⎤⎥⎥⎦ (43)

    and substitute the matrix Q in (38) with �3�T3 . Then a bestestimation of �3 in (43) can be obtained via minimizing thefollowing cost function

    J1 = min(�3)

    1

    2

    m∑i=1

    (C2i (1,2) + C2i (1,3) + C2i (2,3)

    + (Ci (1,1) − Ci (2,2))2)

    (44)

    The minimization scheme can be solved via any nonlin-ear optimization techniques, such as Gradient Descent orLevenberg-Marquardt (LM) algorithm.

    Remark 8 In Result 6, we claim that the symmetric matrixQ can be decomposed into �3�T3 . In practice, the recov-ery of �3 is unnecessary since the upgrading matrix (35) isnot unique. Thus we can simply decompose the matrix intoHlHTl as shown in (42). However, the decomposition is im-possible for negative definite matrix Q. In such cases, it issuggested to parameterize Q with �3 since we can reduce 3unknowns by introducing the vertical extended upper trian-gular matrix (43). Hence we only need to optimize 9 para-meters in the minimization scheme (44).

    We now show how to recover the right part Hr of the up-grading matrix (35). From quasi-perspective equation (17),we have

    xij = (μiP(1:3)i )(j X̄j ) + (μiP(4)i )j (45)

    For all the features in the i-th frame, we make a summa-tion of their coordinates and have

    n∑j=1

    xij = μiP(1:3)in∑

    j=1(j X̄j ) + μiP(4)i

    n∑j=1

    j (46)

    where μiP(1:3)i can be recovered from M̂iHl , μiP

    (4)i =

    M̂iHr . Since the world coordinate system can be chosenfreely, we may set

    ∑nj=1(j X̄j ) = 0, which is equivalent to

    set origin of the world system at the gravity center of thescaled space points. On other hand, since the reconstruc-tion is defined up to a global scalar, we may simply set

    ∑nj=1 j = 1. Thus equation (46) is simplified to

    M̂iHr =n∑

    j=1xij =

    ⎡⎣

    ∑j uij∑j vij

    n

    ⎤⎦ (47)

    which provides 3 linear constraints on the four unknownsof Hr . Therefore, we can obtain 3m equations from the se-quence and Hr can be recovered via linear least squares.

    From the above analysis, we note that the solution of Hris not unique as it is dependant on selection of the world ori-gin

    ∑nj=1(j X̄j ) and the global scalar

    ∑nj=1 j . Actually,

    Hr may be set freely as shown in the following proposition.

    Proposition 9 (Recovery of Hr ) Suppose Hl in (35) is al-ready recovered. Let us construct a matrix as H̃ = [Hl |H̃r ],where H̃r is an arbitrary 4-vector that is independent withthe three columns of Hl . Then H̃ must be a valid upgradingmatrix. i.e., M̃ = M̂H̃ is a valid Euclidean motion matrix,and S̃ = H̃−1Ŝ corresponds to a valid Euclidean shape ma-trix.

    The proof can be found in Appendix 2. According toProposition 9, the value of Hr can be set randomly as any4-vector that is independent to Hl . In practice, Hr may beset from SVD decomposition of

    Hl = U4×4�4×3VT3×3

    = [u1,u2,u3,u4]

    ⎡⎢⎢⎣

    σ1 0 00 σ2 00 0 σ30 0 0

    ⎤⎥⎥⎦ [v1,v2,v3]T (48)

    where U and V are two orthogonal matrices, � is a diagonalof the three singular values. Let us choose an arbitrary valueσr between the biggest and the smallest singular values ofHl , then we may set

    Hr = σru4, H = [Hl ,Hr ] (49)The construction guarantees that H is invertible and has

    the same condition number as Hl , such that we can obtain agood precision in computing the inverse H−1. After recov-ering the Euclidean motion and shape matrices, the intrin-sic parameters and pose of the camera associated with eachframe can be easily computed as follows.

    μi = ‖M(1:3)i(3) ‖, (50)

    fi = 1μi

    ‖M(1:3)i(1) ‖ =1

    μi‖M(1:3)i(2) ‖, (51)

    Ri = 1μi

    K−1i M(1:3)i , Ti =

    1

    μiK−1i M

    (4)i (52)

  • Int J Comput Vis

    where M(1:3)i(t)

    denotes the t-th row of M(1:3)i . The result is ob-tained under quasi-perspective assumption, which is a closeapproximation to the general perspective projection. The so-lution may be further optimized to perspective projection byminimizing the image reprojection residuals.

    J2 = min(Ki ,Ri ,Ti ,μi ,Xj )

    1

    2

    m∑i=1

    n∑j=1

    |x̄ij − x̂ij |2 (53)

    where x̂ij denotes the reprojected image point computedvia perspective projection (1). The minimization process istermed as bundle adjustment (Hartley and Zisserman 2004),which is usually solved via Levenberg-Marquardt iterations.

    4.2 Outline of the Algorithm

    The implementation of the rigid factorization algorithm issummarized as follows.

    Algorithm 10 (Quasi-perspective rigid factorization) Giventhe tracking matrix W ∈ R3m×n across a sequence withsmall camera movements. Compute the Euclidean structureand motion parameters under quasi-perspective projection.

    1. Balance the tracking matrix via point-wise and image-wise rescalings, as in (Sturm and Triggs 1996), to im-prove numerical stability;

    2. Perform rank-4 SVD factorization on the tracking matrixto obtain a solution of M̂ and Ŝ;

    3. Compute the left part of upgrading matrix Hl accordingto (42), or (44) for negative definite matrix Q;

    4. Compute Hr and H according to (49);5. Recover the Euclidean motion matrix M = M̂H and

    shape matrix S = H−1Ŝ;6. Estimate the camera parameters and pose from (50) to

    (52);7. Optimize the solution via bundle adjustment (53).

    Remark 11 In above analysis, as well as in other factoriza-tion algorithms, we usually assume one-parameter-cameramodel as in (38) so that we may use this constraint to recoveran upgrading matrix H. When the one parameter assumptionis not satisfied in real applications, it is possible to take theproposed solution as an initial value and optimize the cam-era parameters via Kruppa constraint arisen from pairwiseimages (Wang et al. 2008).

    Remark 12 The essence of quasi-perspective factorization(34) is to find a rank-4 approximation MS of the trackingmatrix, i.e. to minimize the Frobenius norm ‖W − MS‖2F.Most studies adopt SVD decomposition of W and truncateit to the desired rank. However, when the tracking matrixis not complete, such as some features are missing in some

    frames due to occlusions, it is hard to perform SVD decom-position. In case of missing data, we can replace the step 2in Algorithm 10 with power factorization algorithm (Hart-ley and Schaffalizky 2003; Wang and Wu 2008a) to obtain aleast-square solution of M̂ and Ŝ. Then upgrade the solutionto Euclidean space according to the proposed scheme.

    5 Quasi-perspective Nonrigid Factorization

    For nonrigid factorization, we still follow the Bregler’s as-sumption (7) to represent a nonrigid shape by weightedcombination of k shape bases. Under quasi-perspective pro-jection, the structure is expressed in homogeneous formwith nonzero scalars. Let us denote the scale weightednonrigid structure associated with the i-th frame as S̄i =[1X̄1, . . . , nX̄n], denote the l-th scale weighted shape ba-sis as Bl = [1X̄l1, . . . , nX̄ln]. Then from (7) we have

    X̄i =k∑

    l=1ωilX̄li , i = 1, . . . , n (54)

    Let us multiply a weight scale i on both side as

    iX̄i = ik∑

    l=1ωilX̄li =

    k∑l=1

    ωil(iX̄li ), i = 1, . . . , n (55)

    then we can immediately have the following result.

    Si =[

    S̄iT

    ]=

    [∑kl=1 ωilBl

    T

    ](56)

    We call (56) Extended Bregler’s assumption to homoge-neous case. Under this extension, the quasi-perspective pro-jection of the i-th frame can be formulated as

    Wi = (μiPi )Si = [μiP(1:3)i ,μiP(4)i ][∑k

    l=1 ωilBlT

    ]

    = [ωi1μiP(1:3)i , . . . ,ωikμiP(1:3)i ,μiP(4)i ]

    ⎡⎢⎢⎣

    B1· · ·BkT

    ⎤⎥⎥⎦ (57)

    Thus the nonrigid factorization under quasi-perspectiveprojection can be expressed as

    W3m×n =

    ⎡⎢⎢⎣

    ω11μ1P(1:3)1 · · · ω1kμ1P(1:3)1 μ1P(4)1

    .... . .

    ......

    ωm1μmP(1:3)m · · · ωmkμmP(1:3)m μmP(4)m

    ⎤⎥⎥⎦

    ×

    ⎡⎢⎢⎢⎣

    B1...

    BkT

    ⎤⎥⎥⎥⎦ (58)

  • Int J Comput Vis

    or express concisely in matrix form as

    W3m×n = M3m×(3k+1)B(3k+1)×n (59)The factorization expression is similar to (9). However,

    the difficult problem of estimating the projective depths isavoided here. The rank of the tracking matrix is at most3k+1, and the factorization is defined again up to a transfor-mation matrix H ∈ R(3k+1)×3k+1). Suppose the SVD factor-ization of a tracking matrix with rank constraint is W = M̂B̂.Similar to the rigid case, we can adopt the metric constraintto compute an upgrading matrix. Let us denote the matrixinto k + 1 parts asH = [H1, . . . ,Hk|Hr ] (60)where Hl ∈ R(3k+1)×3(l = 1, . . . , k) denotes the l-th triplecolumns of H, and Hr denotes the last column of H. Thenwe have

    M̂iHl = ωilμiP(1:3)i = ωilμiKiRi , (61)

    M̂iHr = μiP(4)i = μiKiTi (62)Similar to (38) in rigid case, Let us denote Cii′ =

    M̂iQlM̂Ti′ with Ql = HlHTl , we get

    Cii′ = M̂iQlM̂Ti′ = (ωilμiKiRi )(ωi′lμi′Ki′Ri′)T= ωilωi′lμiμi′Ki (RiRi′)KTi′ (63)

    where i and i′ (= 1, . . . ,m) correspond to different framenumbers, l = 1, . . . , k corresponds to different shape bases.Assuming a simplified camera model with only one parame-ter as Ki = diag(fi, fi,1), we have

    Cii = M̂iQlM̂Ti = ω2ilμ2i⎡⎣

    f 2if 2i

    1

    ⎤⎦ (64)

    from which we can obtain following four constraints.

    ⎧⎪⎪⎨⎪⎪⎩

    f1(Ql) = Cii (1,2) = 0f2(Ql) = Cii (1,3) = 0f3(Ql) = Cii (2,3) = 0f4(Ql) = Cii (1,1) − Cii (2,2) = 0

    (65)

    The above constraints are similar to (39) in rigid case.However, the matrix Ql in (64) is a (3k + 1) × (3k + 1)symmetric matrix. According to Proposition 5, its degree-of-freedom should be 9k degree of freedom, since it can bedecomposed into the product of (3k + 1) × 3 vertical ex-tended upper triangular matrix. Given m frames, we have 4mlinear constraints on Ql . It appears that if we have enoughfeatures and frames, the matrix Ql can be solved linearlyby stacking all the constraints in (65). Unfortunately, only

    the rotation constraints may be insufficient when an objectdeforms at varying speed, since most of the constraints areredundant. Xiao and Kanade (2005) proposed a basis con-straint to solve this ambiguity. The main idea is to select kframes that include independent shapes and treat them as aset of bases. Suppose the first k frames are independent ofeach other, then their corresponding weighting coefficientscan be set as

    ωil ={

    1 if i, l = 1, . . . , k and i = l0 if i, l = 1, . . . , k and i �= l (66)

    From (63) we can obtain following basis constraint.

    Cii′ =⎡⎣

    0 0 00 0 00 0 0

    ⎤⎦ if i = 1, . . . , k, i′ = 1, . . . ,m,

    and i �= l (67)Given m images, (67) can provide 9m(k − 1) linear con-

    straints to the matrix Ql (some of the constraints are redun-dant since Ql is symmetric). By combining the rotation con-straint (65) and basis constraint (67) together, the matrix Qlcan be computed linearly. Later, Hl , l = 1, . . . , k can be de-composed from Ql according to following result.

    Result 13 The matrix Ql is a (3k + 1) × (3k + 1) pos-itive semidefinite symmetric matrix of rank 3. It can bedecomposed as Q = HlHTl , where Hl is a (3k + 1) × 3rank 3 matrix. The decomposition can be further written asQ = �3�T3 with �3 being a (3k + 1) × 3 vertical extendedupper triangular matrix.

    The Result can be easily derived from Proposition 5. Notethat the Proposition 9 is still valid for nonrigid case. Thus thevector Hr in (60) can be set as an arbitrary (3k + 1)-vectorthat is independent with all columns in {Hl}l=1,...,k . Afterrecovering the Euclidean upgrading matrix, the camera pa-rameters, motions, shape bases, weighing coefficients canbe easily determined from the motion matrix M = M̂H andshape matrix B = H−1B̂.

    6 Evaluations on Synthetic Data

    6.1 Evaluation on Quasi-perspective Projection

    During the simulation, we randomly generated 200 pointswithin a cube of 20×20×20 in space as shown in Fig. 2(a),where we only displayed the first 50 points for simplic-ity. The depth variation in Z-direction of the space pointsis shown in Fig. 2(b). We simulated 10 images from thesepoints by perspective projection. The image size is set at

  • Int J Comput Vis

    Fig. 2 Evaluation on projective depth approximation of the first 50 points. (a) and (b) Coordinates of the synthetic space points (c) and (d) Thereal and the approximated projective depths under quasi-perspective assumption

    800 × 800. The camera parameters are set as follows: Thefocal lengths are set randomly between 900 and 1100, theprincipal point is set at the image center, and the skew iszero. The rotation angles are set randomly between ±5◦.The X and Y positions of the cameras are set randomlybetween ±15, while the Z positions are set evenly from200 to 220. The true projective depths λij associated withthese points across 10 different views are shown in Fig. 2(c),where the values are given after normalization so that theyhave unit mean value. We then estimate λ1j and μi from(13) and (14), and construct the estimated projective depthsfrom λ̂ij = λ1jμi . The registered result is shown in Fig. 2(d).We can see from experiment that the recovered projectivedepths are very close to the ground truths, and are generallyproportional to the variation of space points in Z-direction.If we adopt affine camera model, it is equivalent to settingall the projective depths to 1. The error is obviously muchbigger than that of the quasi-perspective assumption.

    According to projection equations (28) to (32), differ-ent images will be obtained if we adopt different cameramodels. Here we generated three sets of images using thesimulated space points via general perspective projectionmodel, affine camera model, and quasi-perspective projec-tion model. We compared the errors of quasi-perspectiveprojection model (31) and affine assumption (32). The mean

    errors of different models in each frame are shown inFig. 3(a), the histogram distribution of errors for all 200points across 10 frames is shown in Fig. 3(b). From the re-sult, we can see that the error of quasi-perspective assump-tion is much more smaller than that under affine assumption.

    Influence of different imaging conditions to the quasi-perspective assumption is also investigated. Initially, we fixthe camera position as given in first test and vary the ampli-tude of rotation angles from ±5◦ to ±50◦ in a step of 5◦. Ateach step, we check the relative error of recovered projectivedepths, which is defined as

    eij = |λij − λ̂ij |λij

    × 100 (%) (68)

    where λ̂ij is the estimated projective depth. We carried out100 independent tests at each step so as to obtain a statis-tically meaningful result. The mean and standard deviationof eij are shown in Fig. 4(a). We then fix the rotation an-gles at ±5◦ and vary the relative distance of a camera toan object (i.e. the ratio between the distance of a camera toan object center and that of the object depth) from 2 to 20in a step of 2. The mean and standard deviation of eij ateach step for 100 tests are shown in Fig. 4(b). The resultshows that the quasi-perspective projection is a good ap-

  • Int J Comput Vis

    Fig. 3 Evaluation of the imaging errors by different camera models. (a) The mean error in each frame. (b) The histogram distribution of the errorsunder quasi-perspective and affine projection model

    Fig. 4 Evaluation on quasi-perspective projection under different imaging conditions. (a) The relative error of the estimated depths with differentrotation angles. (b) The relative error with respect to different relative distances

    proximation (eij < 0.5%) when the rotation angles are lessthan ±35◦ and relative distance is larger than 6. Please notethat the result is obtained from noise free data.

    6.2 Evaluation on Rigid Factorization

    We added Gaussian white noise to the initially generated10 images, and varied the noise level from 0 to 3 pixelswith a step of 0.5. At each noise level, we reconstructed the3D structure of the object which is defined up to a similar-ity transformation with the ground truth. We register recon-structed model with the ground truth and calculate the re-construction error, which is defined as mean pointwise dis-tances between reconstructed structure and the ground truth.The mean and standard deviation of the error on 100 inde-pendent tests are shown in Fig. 5. The proposed algorithm(Quasi) is compared with (Poelman and Kanade 1997) un-der affine assumption (Affine) and (Han and Kanade 2000)under perspective projection (Persp). We then take these so-lutions as initial values and perform the perspective opti-mization through LM iterations. It is evident that the pro-

    posed method performs much better than that of affine, theoptimized solution (Quasi+LM) is very close to perspectiveprojection with optimization (Persp+LM).

    The proposed model is based on the assumption of largerelative camera-to-object distance and small camera rota-tions. We compared the effect of the two factors to differ-ent camera models. In first case, we vary the relative dis-tance from 4 to 18 in steps of 2. At each relative distance,we generated 20 images with the following parameters. Therotation angles are confined between ±5◦, the X and Y po-sitions of the camera are set randomly between ±15. We re-covered the structure and computed the reconstruction errorfor each group of images. The mean error by different meth-ods is shown in Fig. 6(a). In the second case, we increasethe rotation angles to the range of ±20◦, and retain othercamera parameters as in the first case. The mean reconstruc-tion error is given in Fig. 6(b). The results are evaluated on100 independence tests with 1-pixel Gaussian noise. We canobtain the following conclusions from the results. (1) The er-ror by quasi-perspective projection is consistently less thanthat by affine, especially at small relative distances. (2) Both

  • Int J Comput Vis

    Fig. 5 Evaluation on rigid factorization. The mean (a) and standard deviation (b) of the reconstruction errors by different algorithms at differentnoise levels

    Fig. 6 The mean reconstruction error of different projection models with respect to varying relative distance. The rotation angles of the cameraare confined to a range of (a) ±5◦ and (b) ±20◦

    reconstruction errors by affine and quasi-perspective projec-tion increase greatly when the relative distance is less than6, since both models are based on large distance assumption.(3) The error at each relative distance increases with the ro-tation angles, especially at small relative distances, since theprojective depths are related with rotation angles. (4) The-oretically the relative distance and rotation angles have noinfluence on the result of full perspective projection. How-ever, we can see that the error by perspective projection alsoincreases slightly with an increase in rotation angles and thedecrease in relative distance. This is because we estimate theprojective depths iteratively starting with an affine assump-tion (Han and Kanade 2000). The iteration easily gets stuckto local minima due to bad initialization.

    We compared the computation time of different factoriza-tion algorithms without LM optimization. The program wasimplemented with Matlab 6.5 on an Intel Pentium 4 3.6 GHzCPU. In this test, we use all the 200 feature points and varythe frame number, from 5 to 200, so as to generate differentdata size. The actual computation time (seconds) for differ-ent data sets are listed in Table 2, where computation time

    Table 2 The average computation time of different algorithms

    Frame number 5 10 50 100 150 200

    Time (s) Affine 0.015 0.015 0.031 0.097 0.156 0.219

    Quasi 0.015 0.016 0.047 0.156 0.297 0.531

    Persp 0.281 0.547 3.250 6.828 10.58 15.25

    for perspective projection is taken under 10 iterations (itusually takes about 30 iterations to compute the projectivedepths in perspective factorization). Clearly, the computa-tion time of quasi-perspective is at the same level as thatunder affine assumption, While the perspective factorizationis computationally more intensive than other methods.

    6.3 Evaluation on Nonrigid Factorization

    In this test, we generated a synthetic cube with 6 evenly dis-tributed points on each visible edge. There are three sets ofmoving points on adjacent surfaces of the cube that moveon the surfaces at constant speed as shown in Fig. 7(a), each

  • Int J Comput Vis

    Fig. 7 Simulation result on nonrigid factorization. (a) Two synthetic cubes with moving points in space. (b) The quasi-perspective factorizationresult of the two frames superimposed with the ground truth. (c) The final structures after optimization

    Fig. 8 Evaluation on nonrigid factorization. The mean (a) and standard deviation (b) of the reconstruction errors by different algorithms atdifferent noise levels

    moving set is composed of 5 points. The cube with movingpoints can be taken as a nonrigid object with 2 shape bases.We generated 10 frames with the same camera parametersas in the first test of rigid case. We reconstructed the struc-ture associated with each frame by the proposed method asshown in Fig. 7(b) and (c). We can see that the structure afteroptimization is visually the same as the ground truth, whilethe result before optimization is a little bit deformed due toperspective effect.

    We compared our method with the nonrigid factorizationunder affine assumption (Xiao et al. 2006) and that underperspective projection (Xiao and Kanade 2005). The meanand standard deviation of the reconstruction errors with re-spect to different noise levels are shown in Fig. 8. It is clear

    that the proposed method performs much better than that un-der affine camera model.

    7 Evaluation on Real Image Sequences

    We tested our proposed method on many real sequences,and we report results of four experiments here. All imagesin the test, except those in the Franck face sequence, werecaptured by Canon Powershot G3 camera with a resolutionof 1024 × 768. In order to ensure large overlap of the ob-ject to be reconstructed, the camera undergoes small move-ment during image acquisition, hence the quasi-perspective

  • Int J Comput Vis

    (a)

    (b)

    (c)

    Fig. 9 Reconstruction result of the stone post sequence. (a) Three images from the sequence, where the tracked features with relative disparitiesare overlaid to the second and the third images. (b) The reconstructed VRML model of the scene shown from different viewpoints with texturemapping. (c) The corresponding triangulated wireframe of the reconstructed model

    assumption is satisfied for all these sequences. Please re-fer to the supplemental video for details of these test re-sults.

    7.1 Test on Stone Post Sequence

    There are 8 images in the stone post sequence, which weretaken at the Sculpture Park near downtown Windsor. We es-tablished the initial correspondences by utilizing techniquein Wang (2006) and eliminated outliers iteratively as in Torret al. (1998). Totally 3693 reliable features were trackedacross the sequence, the features in two frames with relativedisparities are shown in Fig. 9. We recovered 3D structureof the object and camera motions by utilizing the proposedalgorithm, as well as some previous methods. The recoveredcamera focal lengths are listed in Table 3, where we give theresult of first frame only due to limited space, ‘Quasi+LM’,

    ‘Affine+LM’, and ‘Persp+LM’ stand for quasi-perspective,affine, and perspective factorization with global optimiza-

    tion, respectively. Figure 9 shows the reconstructed VRML

    model with texture and corresponding triangulated wire-

    frame viewed from different viewpoints. The reconstructed

    model is visually plausible and realistic.

    In order to give a comparative quantity evaluation, we

    reproject the reconstructed 3D structure back to the images

    and calculate reprojection errors, which is defined as dis-

    tances between detected and reprojected image points. Fig-

    ure 10 shows the histogram distributions of the errors using

    9 bins. The corresponding mean (‘Mean’) and standard devi-

    ation (‘STD’) of the errors are listed in Table 3. We can see

    that the reprojection error by our proposed model is much

    smaller than that under affine assumption.

  • Int J Comput Vis

    7.2 Test on Fountain Base Sequence

    There are 7 images in the fountain base sequence, whichwere also taken at the Sculpture Park of Windsor. The corre-spondences were established using same technique as in pre-vious test. Totally 4218 reliable features were tracked acrossthe sequence as shown in Fig. 11(a). Figure 11(b) and (c)show the reconstructed VRML model with texture mappingand the corresponding triangulated wireframe from differentviewpoints. The model looks realistic and most details arecorrectly recovered by the method. A comparison analysison camera parameters and reprojection errors are presentedin Table 3 and Fig. 10, respectively. We can see from the re-sults that our proposed scheme outperforms that under affinecamera model.

    7.3 Test on Dynamic Grid Sequence

    There are 12 images in the dynamic grid sequence. Thebackground of the sequence is two orthogonal sheets withsquare grids which are used as ground truth for evalua-tion. On the two orthogonal surfaces, there are three mov-ing objects that move linearly in three directions. We es-tablished correspondences using method (Wang 2006), andeliminated outliers interactively. Totally 206 features were

    Table 3 Camera parameters of the first frame and reprojection errorsin real sequence test

    Sequence Method Focus (f ) Mean STD

    Stone post Quasi+LM 2151.8 0.421 0.292Affine+LM 2167.3 0.667 0.461Persp+LM 2154.6 0.237 0.164

    Fountain base Quasi+LM 2140.5 0.418 0.285Affine+LM 2153.4 0.629 0.439Persp+LM 2131.7 0.240 0.168

    tracked across the sequence, where 140 features belong tostatic background and 66 features belong to three mov-ing objects, as shown in Fig. 12(a). We recovered metricstructure of the scenario by utilizing proposed method. Fig-ure 12(b) and (c) show reconstructed VRML models andcorresponding wireframes associated with two dynamic po-sitions. It is clear that the dynamic structure is correctly re-covered.

    The background of this sequence is two orthogonal sheetswith square grids. We take this as ground truth and computethe angle (unit: degree) between two reconstructed surfacesof the orthogonal background, the length ratio of two diag-onals of each square grid and the angle formed by the twodiagonals. The mean errors of these three values are denotedby Eα1, Erat, and Eα2, respectively. The mean reprojectionerror Erep1 of the reconstructed structure is also computed.As a comparison, the results obtained by different methodsare listed in Table 4. The result by the proposed model out-performs that of affine.

    7.4 Test on Franck Face Sequence

    The Franck face sequence was downloaded from the Eu-ropean working group on face and gesture recognition(www-prima.inrialpes.fr/FGnet/). We selected 60 frameswith various facial expressions for the test. The image reso-lution is 720 × 576, and there are 68 tracked feature acrossthe sequence, which are also downloaded from the internet.Figure 13 shows the reconstructed models of four framesutilizing our proposed method. Different facial expressionsare correctly recovered, though some points are not very ac-curate due to tracking errors. The result could be used forvisualization and recognition. For analysis, the relative re-projection error Erep2 generated from different methods arelisted in Table 4. We can see that in all these tests, the ac-curacy by the proposed method is fairly close to that of fullperspective projection, and performs much better than affineassumption.

    Fig. 10 The histogram distributions of the reprojection errors by different algorithms in real sequence test. (a) Result of stone post sequence.(b) Result of fountain base sequence

    http://www-prima.inrialpes.fr/FGnet/

  • Int J Comput Vis

    (a)

    (b)

    (c)

    Fig. 11 Reconstruction result of the fountain base sequence. (a) Three images from the sequence, where the tracked features with relative dispari-ties are overlaid to the second and the third images. (b) The reconstructed VRML model of the scene shown from different viewpoints with texturemapping. (c) The corresponding triangulated wireframe of the reconstructed model

    Table 4 Performance comparison on the grid and face sequences

    Method Eα1 Eα2 Erat Erep1 Erep2

    Quasi 1.62 0.75 0.12 4.37 5.26

    Affine 2.35 0.92 0.15 5.66 6.58

    Persp 1.28 0.63 0.10 3.64 4.35

    Quasi+LM 0.58 0.26 0.04 1.53 2.47Affine+LM 0.96 0.37 0.07 2.25 3.19Persp+LM 0.52 0.24 0.04 1.46 1.96

    8 Conclusion

    In this paper, we proposed a quasi-perspective projectionmodel and analyzed the projection errors of different pro-

    jection models. We applied our proposed model to rigid andnonrigid factorization and elaborated the computation de-tails of Euclidean upgrading matrix. The proposed methodavoids the difficult problem of computing projective depthsin perspective factorization. It is computationally simplewith better accuracy than affine approximation. The pro-posed model is suitable for structure and motion factoriza-tion of a short sequence with small camera motions. Exper-iments demonstrated improvements of our algorithm overexisting techniques. It should be noted that the small rota-tion assumption of the proposed model is not so limited andis usually satisfied in many real applications. During imageacquisition of an object to be reconstructed, we tend to con-trol the camera movement so as to guarantee large overlap-ping part, which also facilitates the feature tracking process.

  • Int J Comput Vis

    (a)

    (b)

    (c)

    Fig. 12 Reconstruction results of the dynamic grid sequence. (a) Three images from the sequence overlaid with the tracked features and relativedisparities shown in the second and the third images, please note the three moving objects. (b) The reconstructed VRML model of the structureshown from different viewpoints with texture mapping. (c) The corresponding triangulated wireframe of the reconstructed model

    For a long sequence of images taken around an object, theassumption is violated. However, we can simply divide thesequence into several subsequences with small movements,then register and merge the result of each subsequence toreconstruct the structure of the whole object.

    Acknowledgements The authors would like to thank the anonymousreviewers for their valuable comments and constructive suggestions.The work is supported in part by Natural Sciences and EngineeringResearch Council of Canada, and the National Natural Science Foun-dation of China under Grant No. 60575015.

    Appendix 1: Proof of Proposition 5

    Extended Cholesky Decomposition: Suppose Qn is a n × npositive semidefinite symmetric matrix of rank k. Then itcan be decomposed as Qn = HkHTk , where Hk is a n × k

    matrix rank k. Furthermore, the decomposition can be writ-ten as Qn = �k�Tk with �k a n × k vertical extended uppertriangular matrix. The degree-of-freedom of the matrix Q isnk − 12k(k − 1), which is the number of unknowns in �k .

    Proof Since Qn is a n × n positive semidefinite symmetricmatrix of rank k, it can be decomposed by SVD as

    Qn = U�UT = U

    ⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

    σ1. . .

    σk0

    . . .

    0

    ⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

    UT (69)

    where U is a n×n orthogonal matrix, � is a diagonal matrixwith σi the singular value of Qn. Thus we can immediately

  • Int J Comput Vis

    (a)

    (b)

    (c)

    Fig. 13 Reconstruction of different facial expressions in Franck face sequence. (a) Four frames from the sequence with the 68 tracked fea-tures overlaid to the last frame. (b) Front, side, and top views of the reconstructed VRML models with texture mapping. (c) The correspondingtriangulated wireframe of the reconstructed model

    have

    Hk = U(1:k)⎡⎢⎣

    √σ1

    . . . √σk

    ⎤⎥⎦ =

    [HkuHkl

    ](70)

    such that Qn = HkHTk , where U(1:k) denotes first k columnsof U, Hku denotes upper (n − k) × k submatrix of Hk , andHkl denotes lower k × k submatrix of Hk . By applying RQ-decomposition on Hkl , we have Hkl = �klOk , where �kl isan upper triangular matrix, Ok is an orthogonal matrix.

    Let us denote HkuOTk as �ku, and construct a n× k verti-cal extended upper triangular matrix �k =

    [�ku�kl

    ]. Then we

    have Hk = �kOk , and

    Qn = HkHTk = (�kOk)(�kOk)T = �k�Tk (71)

    It is easy to verify that the degree-of-freedom of the ma-trix Q (i.e., the number of unknowns in �k) isnk − 12k(k − 1).

    The proposition can be taken as an extension of CholeskyDecomposition to the case of positive semidefinite symmet-ric matrix, while Cholesky Decomposition can only dealwith positive definite symmetric matrix. �

    Appendix 2: Proof of Proposition 9

    Recovery of Hr : Suppose Hl in (35) is already recovered.Let us construct a matrix as H̃ = [Hl |H̃r ], where H̃r is anarbitrary 4-vector that is independent with the three columnsof Hl . Then H̃ must be a valid upgrading matrix, i.e., M̃ =M̂H̃ is a valid Euclidean motion matrix, and S̃ = H̃−1Ŝ cor-responds to a valid Euclidean shape matrix.

  • Int J Comput Vis

    Proof Suppose the correct transformation matrix is H =[Hl |Hr ], then from

    S = H−1Ŝ =[1X̄1, . . . , nX̄n1, . . . , n

    ](72)

    we can obtain one correct Euclidean structure [X̄1, . . . , X̄n]of the object under certain coordinate frame in the world bydehomogenizing of the shape matrix S. The arbitrary con-structed matrix H̃ = [Hl |H̃r ] and the correct matrix H is de-fined up to a 4 × 4 invertible matrix G in form of

    H = H̃G, G =[

    I3 g0T s

    ](73)

    where I3 is a 3 × 3 identity matrix, g is a 3-vector, 0 is azero 3-vector, s is a nonzero scalar. Under the transformationmatrix H̃, the motion M̂ and shape Ŝ are transformed to

    M̃ = M̂H̃ = M̂HG−1 = M[

    I3 −g/s0T 1/s

    ](74)

    S̃ = H̃−1Ŝ = (HG−1)−1Ŝ = G(H−1Ŝ)

    = s[1(X̄1 + g)/s · · · n(X̄n + g)/s

    1 · · · n]

    (75)

    We can see from (75) that the new shape S̃ is actually theoriginal structure that undergoes a translation g and a scale1/s, which does not change the Euclidean structure. From(74) we have M̃(1:3) = M(1:3), which indicates that the first-three-column of the new motion matrix (corresponds to therotation part) does not change. While the last column, whichcorresponds to translation part, is modified in accordancewith the translation and scale changes of the structure.

    Therefore, the constructed matrix H̃ is a valid transfor-mation matrix that can upgrade the factorization from pro-jective space into the Euclidean space. �

    References

    Bascle, B., & Blake, A. (1998). Separability of pose and expression infacial tracing and animation. In Proceedings of the internationalconference on computer vision (pp. 323–328) 1998.

    Brand, M. (2001). Morphable 3D models from video. In Proceedingsof IEEE conference on computer vision and pattern recognition(Vol. 2, pp. 456–463) 2001.

    Brand, M. (2005). A direct method for 3D factorization of nonrigid mo-tion observed in 2d. In Proceedings of IEEE conference on com-puter vision and pattern recognition (Vol. 2, pp. 122–128) 2005.

    Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3D shape from image streams. In Proceedings of IEEEconference on computer vision and pattern recognition (Vol. 2,pp. 690–696) 2000.

    Buchanan, A. M., & Fitzgibbon, A. W. (2005). Damped newton algo-rithms for matrix factorization with missing data. In Proceedingsof IEEE conference on computer vision and pattern recognition(Vol. 2, pp. 316–322) 2005.

    Chen, P. (2008). Optimization algorithms on subspaces: revisitingmissing data problem in low-rank matrix. International Journalof Computer Vision, 80(1), 125–142.

    Christy, S., & Horaud, R. (1996). Euclidean shape and motion frommultiple perspective views by affine iterations. IEEE Transactionson Pattern Analysis and Machine Intelligence, 18(11), 1098–1104.

    Costeira, J., & Kanade, T. (1998). A multibody factorization methodfor independent moving objects. International Journal of Com-puter Vision, 29(3), 159–179.

    Del Bue, A., Smeraldi, F., & de Agapito, L. (2004). Non-rigid struc-ture from motion using nonparametric tracking and non-linear op-timization. In IEEE workshop in articulated and nonrigid motionANM04, held in conjunction with CVPR2004 (pp. 8–15), June2004.

    Del Bue, A., Lladó, X., & de Agapito, L. (2006). Non-rigid metricshape and motion recovery from uncalibrated images using pri-ors. In Proceedings of IEEE conference on computer vision andpattern recognition (Vol. 1, pp. 1191–1198) 2006.

    Han, M., & Kanade, T. (2000). Creating 3D models with uncalibratedcameras. In Proceedings of IEEE computer society workshopon the application of computer vision (WACV2000), December2000.

    Hartley, R. (1997). Kruppa’s equations derived from the fundamentalmatrix. IEEE Transactions on Pattern Analysis and Machine In-telligence, 19(2), 133–135.

    Hartley, R., & Schaffalizky, F. (2003). Powerfactorization: 3D recon-struction with missing or uncertain data. In Australia-Japan ad-vanced workshop on computer vision, 2003.

    Hartley, R., & Vidal, R. (2008). Perspective nonrigid shape and motionrecovery. In ECCV (1), Lecture notes in computer science: Vol.5302 (pp. 276–289). Berlin: Springer.

    Hartley, R., & Zisserman, A. (2004). Multiple view geometry in com-puter vision (2nd edn.). Cambridge: Cambridge University Press.

    Heyden, A., & Åström, K. (1997) Euclidean reconstruction from im-age sequences with varying and unknown focal length and prin-cipal point. In IEEE conference on computer vision and patternrecognition (pp. 438–443) 1997.

    Heyden, A., Berthilsson, R., & Sparr, G. (1999). An iterative factor-ization method for projective structure and motion from imagesequences. Image and Vision Computing, 17(13), 981–991.

    Li, T., Kallem, V., Singaraju, D., & Vidal, R. (2007). Projective fac-torization of multiple rigid-body motions. In IEEE conference oncomputer vision and pattern recognition, 2007.

    Luong, Q., & Faugeras, O. (1997). Self-calibration of a moving cam-era from point correspondences and fundamental matrices. Inter-national Journal of Computer Vision, 22(3), 261–289.

    Mahamud, S., & Hebert, M. (2000). Iterative projective reconstructionfrom multiple views. In IEEE conference on computer vision andpattern recognition (Vol. 2, pp. 430–437) 2000.

    Maybank, S., & Faugeras, O. (1992). A theory of self-calibration of amoving camera. International Journal of Computer Vision, 8(2),123–151.

    Oliensis, J., & Hartley, R. (2007). Iterative extensions of theSturm/Triggs algorithm: convergence and nonconvergence. IEEETransactions on Pattern Analysis and Machine Intelligence,29(12), 2217–2233.

    Poelman, C., & Kanade, T. (1997). A paraperspective factorizationmethod for shape and motion recovery. IEEE Transactions on Pat-tern and Analysis and Machine Intelligence, 19(3), 206–218.

    Pollefeys, M., Koch, R., & Van Gool, L. (1999). Self-calibration andmetric reconstruction in spite of varying and unknown intrinsiccamera parameters. International Journal of Computer Vision,32(1), 7–25.

    Quan, L. (1996). Self-calibration of an affine camera from multipleviews. International Journal of Computer Vision, 19(1), 93–105.

  • Int J Comput Vis

    Rabaud, V., & Belongie, S. (2008). Re-thinking non-rigid structurefrom motion. In IEEE conference on computer vision and patternrecognition, 2008.

    Sturm, P. F., & Triggs, B. (1996). A factorization based algorithm formulti-image projective structure and motion. In European confer-ence on computer vision (2) (pp. 709–720) 1996.

    Tomasi, C., & Kanade, T. (1992). Shape and motion from imagestreams under orthography: a factorization method. InternationalJournal of Computer Vision, 9(2), 137–154.

    Torr, P. H. S., Zisserman, A., & Maybank, S. J. (1998). Robust de-tection of degenerate configurations while estimating the funda-mental matrix. Computer Vision and Image Understanding, 71(3),312–333.

    Torresani, L., Yang, D. B., Alexander, E. J., & Bregler, C. (2001).Tracking and modeling non-rigid objects with rank constraints.In IEEE conference on computer vision and pattern recognition(Vol. 1, pp. 493–500) 2001.

    Torresani, L., Hertzmann, A., & Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical pri-ors. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 30(5), 878–892.

    Triggs, B. (1996). Factorization methods for projective structure andmotion. In Proceedings of the IEEE conference on computer vi-sion and pattern recognition (pp. 845–851). San Francisco, Cali-fornia, USA, 1996.

    Vidal, R., & Abretske, D. (2006). Nonrigid shape and motion frommultiple perspective views. In European conference on computervision (2). Lecture notes in computer science: Vol. 3952 (pp. 205–218). Berlin: Springer.

    Vidal, R., Tron, R., & Hartley, R. (2008). Multiframe motion segmen-tation with missing data using powerfactorization and GPCA. In-ternational Journal of Computer Vision, 79(1), 85–105.

    Wang, G. (2006). A hybrid system for feature matching based on SIFTand epipolar constraints. (Tech. Rep.). Department of Electricaland Computer Engineering, University of Windsor.

    Wang, G., Tsui, H.-T., & Wu, J. (2008). Rotation constrained powerfactorization for structure from motion of nonrigid objects. Pat-tern Recognition Letters, 29(1), 72–80.

    Wang, G., & Wu, Q. J. (2008a). Stratification approach for 3D euclid-ean reconstruction of nonrigid objects from uncalibrated imagesequences. IEEE Transactions on Systems, Man, and Cybernet-ics, Part B, 38(1), 90–101.

    Wang, G., & Wu, J. (2008b). Quasi-perspective projection with appli-cations to 3D factorization from uncalibrated image sequences.In IEEE conference on computer vision and pattern recognition,2008.

    Wang, G., Wu, J., & Zhang, W. (2008). Camera self-calibration andthree dimensional reconstruction under quasi-perspective projec-tion. In Proceedings Canadian conference on computer and robotvision (pp. 129–136) 2008.

    Xiao, J., & Kanade, T. (2005). Uncalibrated perspective reconstructionof deformable structures. In Proceedings of the international con-ference on computer vision (Vol. 2, pp. 1075–1082) 2005.

    Xiao, J., Chai, J., & Kanade, T. (2006). A closed-form solution to non-rigid shape and motion recovery. International Journal of Com-puter Vision, 67(2), 233–246.

    Yan, J., & Pollefeys, M. (2005). A factorization-based approach to ar-ticulated motion recovery. In IEEE conference on computer visionand pattern recognition (2) (pp. 815–821) 2005.

    Yan, J., & Pollefeys, M. (2008). A factorization-based approach forarticulated nonrigid shape, motion and kinematic chain recoveryfrom video. IEEE Transactions on Pattern Analysis and MachineIntelligence, 30(5), 865–877.

    Quasi-perspective Projection Model: Theory and Application to Structure and Motion Factorization from Uncalibrated Image SequencesAbstractIntroductionBackground on FactorizationProblem DefinitionRigid FactorizationNonrigid Factorization

    Quasi-perspective ProjectionQuasi-perspective ProjectionError Analysis of Different Projection Model

    Quasi-Perspective Rigid FactorizationRecovery of the Euclidean Upgrading MatrixOutline of the Algorithm

    Quasi-perspective Nonrigid FactorizationEvaluations on Synthetic DataEvaluation on Quasi-perspective ProjectionEvaluation on Rigid FactorizationEvaluation on Nonrigid Factorization

    Evaluation on Real Image SequencesTest on Stone Post SequenceTest on Fountain Base SequenceTest on Dynamic Grid SequenceTest on Franck Face Sequence

    ConclusionAcknowledgementsAppendix 1: Proof of Proposition 5Appendix 2: Proof of Proposition 9References

    /ColorImageDict > /JPEG2000ColorACSImageDict > /JPEG2000ColorImageDict > /AntiAliasGrayImages false /CropGrayImages true /GrayImageMinResolution 150 /GrayImageMinResolutionPolicy /Warning /DownsampleGrayImages true /GrayImageDownsampleType /Bicubic /GrayImageResolution 150 /GrayImageDepth -1 /GrayImageMinDownsampleDepth 2 /GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true /GrayImageFilter /DCTEncode /AutoFilterGrayImages true /GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict > /GrayImageDict > /JPEG2000GrayACSImageDict > /JPEG2000GrayImageDict > /AntiAliasMonoImages false /CropMonoImages true /MonoImageMinResolution 600 /MonoImageMinResolutionPolicy /Warning /DownsampleMonoImages true /MonoImageDownsampleType /Bicubic /MonoImageResolution 600 /MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000 /EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode /MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None ] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false /PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true /PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ] /PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier () /PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped /False

    /Description >>> setdistillerparams> setpagedevice