vision-based algorithm and robust filtering ......1politecnico di milano, aerospace science and...

VISION-BASED ALGORITHM AND ROBUST FILTERING FOR STATE ESTIMATIONOF AN UNCOOPERATIVE OBJECT IN SPACE

Vincenzo Pesce1, Luca Losi1, and Michèle Lavagna1

1Politecnico di Milano, Aerospace Science and Technology Department, via La Masa, 34, 20156 Milano, Italy,[email protected], [email protected], [email protected]

ABSTRACT

A Vision Based algorithm relying on a mono-cameraas main sensor has been developed at PoliMi DAER tobe used as main measurement tool to feed the Naviga-tion filter for relative pose estimation of uncooperativeobjects. The Vision Based algorithm works combininga Perspective-n-Point problem (PnP) along with Bun-dle Adjustment (BA). This is an optimization techniquewidely diffused in Computer Vision, that has been ap-plied on a built offline sparse map and on the 2D featuresextracted from the images coming from the camera to op-timize the retrieved relative position and attitude, whichare finally given in input to the navigation filter. A de-coupled model is used for the relative translational androtational dynamics. A first numerical validation of boththe Vision based routine and the filter is presented by us-ing, as input, measurements obtained from synthetic im-ages. The obtained results are finally analyzed and criti-cally discussed.

Key words: Visual tracking; Filtering; State estimation;Uncoperative objects.

1. INTRODUCTION

Nowadays all the major space agencies and industriesare undertaking the enhancement of the automation ca-pabilities of their spacecraft. Autonomous operations areextremely advantageous for space missions, with a widerange of possible applications. For example, autonomousrelative navigation around unknown and uncooperativeobjects is particularly appealing. Precise pose and motionestimation of an uncooperative object, such as a ResidentSpace Object (RSO) has a potential utilization in the do-main of space debris removal and on-orbit servicing. Inthe last decade, several different works have tried to solvethis problem. This paper wants to investigate the poten-tiality of an hybrid approach, combining vision-based al-gorithm for pose estimation with robust filtering meth-ods. Vision Based systems represent nowadays a promis-ing tool to obtain satellite relative trajectory estimationwithin an optimal level of accuracy. Being this topic firstborn in the Computer Vision field, an increasing migra-tion and development of these techniques to space appli-cations is currently occurring, with specifically developed

algorithms and hardware ( e.g. more powerful and multi-core CPUs ) borned in recent years. Within this context,a Vision Based algorithm relying on a mono-camera asmain sensor has been chosen and developed at PoliMiDAER along with a Navigation filter for relative pose es-timation of uncooperative objects. The Vision Based al-gorithm is built to be used as main measurement tool tofeed the Navigation filter.This paper is organized as follows: in the first two partsthe Vision Based tracking and the Navigation filter arepresented and described with each of their constituentblocks; in the third part a first numerical validation ofboth the algorithm is presented by using, as input, mea-surements obtained from synthetic images. To validateand assess the software accuracy, synthetically generatedtrajectories, along with a point cloud model simulating atarget object, and correspondent image projections havebeen firstly used. To conlude, the obtained results are fi-nally analyzed and critically discussed.

2. SYSTEM OVERVIEW

The Vision Based algorithm implements a Visual Odom-etry like routine [1][2] and works detecting the targetobject features from the incoming images, given by themono-camera; these are then matched to an already avail-able on-board map (constituted by a mesh of 3D points,each one correlated to a descriptor) and, in this way, a setof 3D to 2D correspondences is built. From the set of cor-respondences the so called Perspective-n-Point problem(PnP) is built and solved within a RANSAC routine inorder to delete incoming outliers (wrong match betweentarget image and on-board map) and obtain a first esti-mate of the relative pose; relative position of the cam-era (and hence of the target-chaser) which is obtained interms of rototranslation. Bundle Adjustment (BA), an op-timization technique widely diffused in Computer Vision[3], is then applied on the map and on the 2D features(which constitute the measurements) in order to optimizethe obtained pose, which is finally given in input to thenavigation filter. The on-board 3D sparse map of the tar-get object is built on-ground with a dedicated algorithm,by correlations of a 3D model of the target uncoopera-tive object with descriptors extracted from multiple im-ages of it. This first estimate of the pose (relative posi-tion and attitude) of the observed uncooperative object is

fed to the navigation filter. A decoupled model is usedfor the relative translational and rotational dynamics. Alinear model, including eccentricity, is implemented andused as model of the dynamics inside the filter. Linearmodels allow for the exploitation of robust Kalman filtertechniques. These methods result to be more reliable androbust to uncertainties. For the rotational dynamics, thenon-linear equations constrain to use a non-linear filtersuch as the Extended Kalman Filter. However, the use ofan alternative formulation, based on Lie-Groups, is hereinvestigated. This formulation should allow the preserva-tion of the natural symmetry of the physical system with-out any ambiguity or singularity.Fig.1 shows the global structure of the relative naviga-tion system. Vision Based tracking is represented withits main components: Features detection and matching,motion estimation and optimization. Pose retrieved fromthis block is then given in input to the filter, which man-ages separately translations and rotations to give the finalrelative state estimate (including velocities) of the target-chaser spacecraft.

3. VISION BASED TRACKING

In this section all the building blocks constituting the Vi-sion Based tracking algorithm are described along withthe steps that are performed with every frame of the cam-era.

3.1. Features detection

Features are salient points that can be localized on eachimage coming from the camera, and different algorithmsfor their detection exists [4][5][6][7]. Features detectionis a fundamental step in the routine of the Vision Basedtracking algorithm because it allows to build the corre-spondences with the available map that are then used toretrieve the relative pose of the target. The more robustare the features, the more accurate will be the matchingwith the 3D-model resulting in a more accurate pose es-timation.A general in-orbit relative chaser-target spacecraft tra-jectory implies modest scale variations between differ-ent frames along with rotations; moreover, different lightconditions will be met moving along the orbit. Featuresextracted, to be robust, shall be invariant to each of theseparameters. Moreover, computational cost has to be con-sidered and has to be as low as possible for real time ap-plication. ORB detector [7] has been selected for im-plementation considering these requirements. ORB isa feature detector and fast binary descriptor based onBRIEF[8] and FAST[6] whose features are extremely fastto compute, have good invariance to viewpoint, scale andare resilient to different light conditions. At the momentORB detector has been implemented to work building an8−Level grey scale pyramid with a scale factor of 1.2 foreach incoming image. An upper bound of 300 featuresalong with their descriptors is set to be extracted to main-tain low the computational cost. Fig.2 shows an exampleof extracted features on a generic image of Suomi-NPPsatellite. Green markes show features location along with

their pyramid level and orientation. Substitution of theactual routine with background subtraction methods willbe carefully evaluated in the future both in terms of per-formances of the tracking algorithm and computationalcosts in order to directly separate spacecraft image fromthe background and extract an higher number of featureson it avoiding unnecessary zones.

3.2. Features matching with 3D-model

The matching step in the algorithm consists in buildinga set of correspondences between the detected observed2D features in the current frame and the points of theavailable target spacecraft 3D-model. Matching is per-formed exploiting Hamming distance as linearity mea-sure between ORB descriptors extracted in the image anddescriptors of the 3D-model. Hamming distance can becomputed very efficiently between corresponding binarydescriptor strings and makes the process very fast. Foreach descriptor from the frame the matcher finds the twoclosest descriptors (in order of score) in the 3D-model bytrying one by one. Once the set of matches is obtained, aratio test according to [4] is applied: the distance of theclosest descriptor to that of the second-closest descrip-tor is compared; correct matches need to have the closestdescriptor significantly closer than the closest incorrectmatch to achieve reliable matching. Only matches witha distance ratio lower than a fixed threshold are thereforeretained. Ratios between 0.7 and 0.8 actually ensures thebetter performances. This is an heuristic way to immedi-ately discard clear outliers and retain only good matches.

3.3. Motion estimation with PnP

The 2D features from frame to 3D-model correspon-dences obtained from the features matching step are usedto solve the "Perspective-n-Point" problem in this sub-block of the routine. The PnP problem is namely theproblem of estimating the pose of a calibrated cameragiven a set of n 3D points in the world reference and theircorresponding 2D projections on the image.The EPnP algorithm [9] is exploited by the Vision Basedtracking, which implements an efficient solution to theproblem, it is non-iterative and applicable for both planarand non-planar 3D clouds configurations. The algorithmworks within a RANSAC [10] routine in order to be robustto the presence of outliers and gives a first estimation ofthe relative position and orientation of the chaser camerawith respect to the target.

3.4. Motion only Bundle Adjustment

Motion estimation by itself does not give a sufficientaccuracy in trajectory reconstruction. Uncertainty in ex-tracted features location due to noise, presence of outliersbetween the 2D-3D correspondences and uncertaintyin the built 3D-model all result in a pose estimationfrom the EPnP algorithm which drifts from the truetrajectory. Bundle Adjustment (BA) [3]is thereforeimplemented to counteract these effects and to correctthe trajectory estimation. BA technique is designed to

Figure 1: Relative Navigation System architecture. Vision Based tracking and Filter are shown with their component.

Figure 2: Example image with ORB features extracted.Green markers indicate pyramid level and orientation ofthe feature.

optimize camera pose minimizing a function that is theerror in reprojecting 3D points whose 2D coordinates areknown on the images and taking as constraint the scenegeometry.g2o, a library specifically developed for pose graphoptimization, is exploited as tool for the BA implemen-tation. Pose graph is a way of formulating SimultaneousLocalizantion and Mapping (SLAM) problem in whichgraph are used whose nodes represent map points andposes of the robot in time and whose edges representconstraints between the poses (measurements). Oncesuch graph is constructed, BA is applied to find aconfiguration of the nodes which is maximally consistentwith the measurementsThis approach has been implemented for the VisionBased tracking in the so called "Motion only BA" ver-sion: At the end of each motion estimation step, retrievedcamera pose and matched 3D-model points seen fromthe current frame are set as vertices of the graph, whileobserved features representing the measurements definethe edges. The obtained graph is optimized keepingfixed the map points and improving the camera poseestimation only, which is then given in input to thenavigation filter.

4. FILTERING

Once the images are processed and the measurements areavailable, it is possible to perform a filtering procedure.In this work, decoupled filters are used, one for the trans-lation and another for the relative attitude estimation.

4.1. Translation filter

For the translational part of the filter, an H-∞ Filter hasbeen used. This kind of filter guarantees robustness min-imizing the worst case estimation error [11]. Let’s con-sider a linear time-invariant system:

xk+1 = Fk xk +Bwkyk = Hk xk +Dvk

(1)

with xk the state vector, wk and the vk the process andmeasurement noise respectively and yk the measurementoutput. Similarly to the Kalman Filter, the correctionequation can be defined as:

xk+1 = Fk xk + FkKk (yk −Hk xk) (2)

However, K has not the same expression of the KalmanFilter. In particular we want to find K such that

‖Tew‖∞<1

θ, where Tew represents the difference be-

tween the predicted and real state and θ is a tuning pa-rameter. To find the value of K it is possible to explicit aRiccati-like equation:

PK+1 = FkPk[I − θPk +HkTRk

−1HkPk]−1FT +Qk

(3)

And the expression of the gain is:

Kk = Pk[I−θPk+HkTRk

−1HkPk]−1Hk

TRk−1 (4)

A linearized model for the dynamics has been selected.In particular, in this work, the authors used the formula-tion by Yamanaka and Ankersen [12]. Their derivationof the state transition matrix can be very advantageousfor the implementation of filtering techniques and controlsystems. In fact, an expression for the F matrix of eq.1is directly derived. The details of the implementation arenot presented here and they can be found in [12]. On theother side, the measurement model is easily derived sincethe relative position between the two centers of mass isdirectly the state. For this reason, the H matrix in eq.1 isequal to identity.

4.2. Rotation filter

For the rotation part, a minimum energy filter on the Liegroup is implemented. Minimum energy filtering wasfirstly introduced by Mortensen [13]. Recently, it has

been specialized to attitude estimation on the Special Or-thogonal Group SO(3) by Zamani et al. [14]. Our imple-mentation is derived from the one presented by Zamani[15], however a slightly different formulation is herebyproposed. In fact, a simplification can be introducedwithout considering the real dynamics of the system. Themain advantage of this approach is that it is not necessaryto know the exact value of the inertia matrix of the tar-get and the filter can be expressed directly in the camerareference frame. In particular:

R = R(ω(t)×)

ω = 0 +Bδ(5)

This lead to the following filter formulation:

R(t0) = R0, ω(t0) = ω0, K(t0) = K0,

˙R = R((ω(t) +K11 r

R +K12 rω)×,

˙ω = K21 rR +K22 r

ω

K(t) = −αK +AK +KAT −KEK +GQ−1GT

−WK −KWT

(6)with

rt =

[rR

rω

]=

[−(u1)(r1 × r1)− (u2)(r2 × r2)

0

]ri = RT ri ri = RT ri + di ε

A =

[−ω× I0 0

]

E =

2∑i=1

ui((ri)×(ri)× + (ri)×(ri)×)/2 0

0 0

GQ−1GT =

[0 00 GQ−1GT

]W =

[1

2(K11 r

R +K12 rω)× 0

0 0

](7)

5. SIMULATION SCENARIO

In order to validate and asses the performance of both theVision Based tracking and the filter, and of the whole nav-igation routine together, a preliminary test campaign hasbeen performed on a simulated scenario. A MEO orbithas been selected for the chaser spacecraft with an eccen-tricity of 0.17 and semi-major axis of 8790 km. The rel-ative reference dynamics is simulated directly integratingthe nonlinear equations for the relative motion withoutconsidering any orbital perturbation. The assumed initialconditions are:

ρ0 = [50, 0, 0] m;

ρ0 = [0,−0.1, 0] m/s;(8)

expressed in the local-vertical, local-horizontal (LVLH)reference frame fixed to the chaser spacecraft center of

mass, with x directed radially outward, z normal to thechaser’s orbital plane, and y completing the triad. Thisinitial conditions have been chosen to have an in-planeelliptical motion of one spacecraft with respect to theother. This trajectory can be representative of a moni-toring or close approach phase. It has to be underlinedthat the non perfect absolute state determination of thechaser spacecraft is taken into account. In particular, thetrue anomaly, input of the relative dynamics model, isobtained from an absolute state corrupted by noise. Thenoise associated to the absolute position is modeled asa normal distribution with 0mean and a standard devi-ation of 10× 10−2 km and the noise of the velocity isassumed to be a normal distribution with 0mean and astandard deviation of 10× 10−4 km s−1. For the rela-tive dynamics, a torque-free tumbling motion has beenimposed to the simulated target spacecraft. The motionhas been simulated using the classical Euler equationfor rigid body, imposing the following initial conditions:ω0 = [1, 0.1, 0.3]deg/s. This values have been used toreproduce a generic tumbling motion with significant an-gular velocity. This can be representative of a tumblingspace debris. Target spacecraft has been modelled with asimple shape taking as reference the NASA Suomi-NPPsatellite. A set of 38 uniform distribute points has beenconsidered and constitutes the 3D-model used by the Vi-sion Based tracking. The set of images to be provided tothe Navigation algorithm have been instead directly ob-tained by projecting the 3D-model points onto the im-age plane. To take into account for the uncertainty infeatures detection that arise due to the presence of noisefor an application on real images, a uniform distributednoise between 0 and 2 pixels has been added to the 2Dfeatures location. This error, when at maximum level,already generates some outliers that are consequently in-serted in the set of 2D-3D correspondences. Fig.3 showsthe true 3D built model of the spacecraft along with theselected features, both with zero noise. Fig.4 shows in-

Figure 3: 3D spacecraft model used for the simulationsalong with considered features.

stead a short sequence of the generated synthetic imageswith the correspondent features, from which it is possibleto observe a portion of the spacecraft tumbling motion.

(a) (b) (c)

(d) (e) (f)

Figure 4: Simulated target spacecraft images with fea-tures.

6. RESULTS

In this section, the results of the pose determination algo-rithm and of the overall relative state determination arepresented. The estimation errors are computed as fol-lows. The position error is computed as:

eρ =√(xi − xi)2 + (yi − yi)2 + (zi − zi)2 (9)

where x, y, z are the position components estimates.Similarly, the velocity error is:

eρ =

√(xi − ˆxi)2 + (yi − ˆyi)2 + (zi − ˆzi)2 (10)

The relative attitude error is computed as:

eR = acos

(1− tr(I −RiT Ri)

2

)(11)

with R being the estimated rotation matrix.

The results of the translational H-∞ Fliter are presentedin this paragraph along with the pose determination re-sults. In particular, the estimation errors correspondingto the 2 pixels noise case with measurement frequency of10Hz are considered. Fig. 5 shows the estimation er-ror of the position. Fig. 6 instead, shows the error of thecorresponding relative translational velocity estimates forthe H-∞ Filter.

The advantage of having a filter downstream of theimage processing is evident. The overall position erroris strongly reduced and it always stays beyond 0.4m.Moreover, very good results are obtained for the estima-tion of the relative translational velocity.

Similarly to what has been done for the translationalfilter, the results of the rotational filter are be presented.

0 1000 2000 3000 4000 5000 6000 7000 8000

Time [sec]

0

0.5

1

1.5

2

2.5

e [m

]

Figure 5: Relative Position Error

0 1000 2000 3000 4000 5000 6000 7000 8000

Time [sec]

10-6

10-5

10-4

10-3

10-2

Figure 6: Relative Velocity Error

0 1000 2000 3000 4000 5000 6000 7000 8000

time [s]

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

e R [d

eg]

Figure 7: Relative Attitude Error

Fig. 7 show the rotation filtering results. Once again thefilter has improved quite significantly the measurmentsfrom the Vision Based tracking, The overall rotationerror is always beyond 0.1 deg.

7. CONCLUSION

This paper investigates the possibility of combining im-age processing techniques with robust filtering algo-rithms for relative state estimation of uncooperative spaceobjects. The presented results show how the implementedsolution allows to obtain a precise estimation of the rel-ative position, velocity and attitude. The performance ofthe image processing algorithms have been preliminaryassessed. The obtained results are very promising but afurther analysis with real images is necessary. An exten-sive analysis is currently ongoing to test both image pro-cessing and filters with different scenarios. The presenceof filters for the translational and rotational dynamics is aclear advantage for the relative state estimation. A moreextensive analysis is also needed for the filtering part toassess its robustness and efficiency over different simula-tions conditions.One of the next development steps will be the validationof the proposed algorithms with the experimental facil-ity currently under setup at Politecnico di Milano De-partment of Aerospace Science and Technologies. Thefacility comprises a 7-DOF Mitshubishi PA-10 roboticarm which will be used to replicate the relative motionbetween a chaser and an uncooperative target. On therobotic arm end-effector, a 3D-printed satellite mock-upwill be installed, with a fixed camera acquiring its im-ages during motion. This setup will allow the validationof both image processing and pose determination algo-rithms under more realistic conditions up to TRL4.

REFERENCES

1. D. Scaramuzza and F. Fraundorfer, “Visual odometry[tutorial],” Robotics & Automation Magazine, IEEE,vol. 18, no. 4, pp. 80–92, 2011.

2. F. Fraundorfer and D. Scaramuzza, “Visual odometry:Part ii: Matching, robustness, optimization, and ap-plications,” Robotics & Automation Magazine, IEEE,vol. 19, no. 2, pp. 78–90, 2012.

3. B. Triggs, P. F. McLauchlan, R. I. Hartley, andA. W. Fitzgibbon, “Bundle adjustment—a modernsynthesis,” in Vision algorithms: theory and practice.Springer, 1999, pp. 298–372.

4. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of com-puter vision, vol. 60, no. 2, pp. 91–110, 2004.

5. H. Bay, T. Tuytelaars, and L. Van Gool, “Surf:Speeded up robust features,” in Computer vision–ECCV 2006. Springer, 2006, pp. 404–417.

6. E. Rosten and T. Drummond, “Machine learning forhigh-speed corner detection,” in Computer Vision–ECCV 2006. Springer, 2006, pp. 430–443.

7. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski,“ORB: an efficient alternative to SIFT or SURF,” inComputer Vision (ICCV), 2011 IEEE InternationalConference on. IEEE, 2011, pp. 2564–2571.

8. M. Calonder, V. Lepetit, C. Strecha, and P. Fua,“Brief: Binary robust independent elementary fea-tures,” Computer Vision–ECCV 2010, pp. 778–792,2010.

9. V. Lepetit, F. Moreno-Noguer, and P. Fua, “Epnp: Anaccurate o(n) solution to the pnp problem,” Interna-tional journal of computer vision, vol. 81, no. 2, pp.155–166, 2009.

10. M. A. Fischler and R. C. Bolles, “Random sampleconsensus: a paradigm for model fitting with applica-tions to image analysis and automated cartography,”Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981.

11. D. Simon, Optimal state estimation: Kalman, H in-finity, and nonlinear approaches. John Wiley &Sons, 2006.

12. K. Yamanaka and F. Ankersen, “New state transitionmatrix for relative motion on an arbitrary ellipticalorbit,” Journal of guidance, control, and dynamics,vol. 25, no. 1, pp. 60–66, 2002.

13. R. Mortensen, “Maximum-likelihood recursive non-linear filtering,” Journal of Optimization Theory andApplications, vol. 2, no. 6, pp. 386–394, 1968.

14. M. Zamani, J. Trumpf, and R. Mahony, “Minimum-energy filtering for attitude estimation,” IEEE Trans-actions on Automatic Control, vol. 58, no. 11, pp.2917–2921, 2013.

15. A. Saccon, J. Trumpf, R. Mahony, and A. P. Aguiar,“Second-order-optimal minimum-energy filters on liegroups,” IEEE Transactions on Automatic Control,vol. 61, no. 10, pp. 2906–2919, 2016.

vision-based algorithm and robust filtering ......1politecnico di milano, aerospace science and...

Documents