robotic home sentinel - ulisboarobotic home sentinel david cruz veiga Álvares pereira thesis to...

Robotic Home Sentinel

David Cruz Veiga Álvares Pereira

Thesis to obtain the Master of Science Degree in

Electrical and Computer Engineering

Supervisor: Professor José António da Cruz Pinto Gaspar

Co-Supervisor: Doctor Ricardo Jorge Santos Ferreira

Examination Committee

Chairperson: Professor João Fernando Cardoso Silva Sequeira

Supervisor: Professor José António da Cruz Pinto Gaspar

Member of the Committee: Professor Rodrigo Martins de Matos Ventura

April 2014

Contents

Agradecimentos iii

Resumo v

Abstract vii

1 Introduction 3

1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

1.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 5

1.4 Thesis Structure and Contributions . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 6

1.5 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6

2 Background 7

2.1 Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

2.1.1 Perspective Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

2.1.2 Extrinsic Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.3 Intrinsic Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.4 Depth Image to 3D Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.5 Back-projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Integration of measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 11

2.2.1 The Discrete Kalman Filter Algorithm . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.2 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.3 Other Types of Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Application to the thesis context . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 15

3 Navigation using Color-Depth Cameras 17

3.1 Occupancy Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 18

3.1.1 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.2 Map Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

ii CONTENTS

3.1.3 Loop Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Motion Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Potential Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.2 Randomized Methods for Path Planning . . . . . . . . . . . . . . . . . . . . . .23

3.2.3 Probabilistic Roadmaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.4 Rapidly Exploring Random Trees . . . . . . . . . . . . . . . . . . . . . . . . .24

3.2.5 Comparison of Trajectory Planning Methods . . . . . . . . . . . . . . . . . .. 25

4 Navigation based in Monocular Vision 29

4.1 Depth Information Not Directly Available . . . . . . . . . . . . . . . . . . . . . . .. . 29

4.2 Iterative Estimation of Tubular Structure and Camera Motion . . . . . . . . . .. . . . . 31

4.2.1 State Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2.2 System Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2.3 Observation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3 Dewarping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 32

4.3.1 Dewarping a VRML Tubular Shape . . . . . . . . . . . . . . . . . . . . . . . .33

4.3.2 Dewarping a Real Tubular Shape . . . . . . . . . . . . . . . . . . . . . . . .. . 34

4.4 Rectangular Section for a Corridor . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 34

4.4.1 Results obtained for the synthetic case . . . . . . . . . . . . . . . . . . . . . .. 35

4.4.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Experiments 39

5.1 The Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39

5.2 Single Image Global Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40

5.3 Motion Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.4 Occupancy Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 44

5.5 Patrol #1 - Run with no obstacles . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 46

5.6 Patrol #2 - Unknown blocked passage . . . . . . . . . . . . . . . . . . . . .. . . . . . 49

5.7 Mapping and Navigation with Monocular Vision . . . . . . . . . . . . . . . . . . .. . 51

6 Conclusion and Future Work 55

Agradecimentos

Gostaria de agradecer ao meu orientador Professor José António Gaspar e co-orientador Doutor Ricardo

Ferreira toda a disponibilidade e apoio prestados no decorrer desta dissertação.

A realização e conclusão do curso em Engenharia Electrotécnica e Computadores não seria possível

sem o sempre presente apoio da minha família e amigos. A todos eles um muito obrigado por me terem

influenciado de alguma maneira.

iv CONTENTS

Resumo

Esta dissertação aborda o problema de mapear e navegar um robot móvelem ambientes interiores, uti-

lizando visão artificial. O primeiro sensor considerado é uma câmara RGBD. Ainformação de profun-

didade permite transportar para 3D pontos característicos seleccionadosna imagem ("SIFT features").

Um conjunto de três pontos 3D, ou mais, correspondidos entre imagens consecutivas, permitem estimar

o movimento de uma câmara considerando uma transformação rígida (problema de Procrustes). Para

além da estimação de movimento, as imagens de profundidade permitem calcular oespaço livre à frente

do robot. A partir de várias leituras de espaço livre pode ser criado um mapa de ocupação ("occupancy

grid"), que depois permite encontrar trajectórias para navegação.

Alternativamente a utilizar uma câmara RGBD, consideramos também a utilização deuma câmara

RGB/grey. A combinação de várias imagens obtidas ao longo do tempo, utilizando uma metodologia de

mapeamento e localização (SLAM), permite recuperar informação 3D. Em particular consideramos um

modelo geométrico para o cenário assumindo uma estrutura tubular com secção rectangular (corredor).

A retro-projecção permite passar pontos no plano da imagem em 2D para 3Dno referencial da câmara,

assumindo conhecida a calibração. Sobre o mapa construído, como no caso anterior, é aplicado um

planeador de trajectórias, resultando no final um caminho da posição actual para a desejada.

Ambas as metodologias de navegação foram testadas em ambientes de realidade virtual. Os resulta-

dos obtidos comparados com os dados da simulação,ground truth, são promissores.

Palavras chave:Modelo Câmara, EKF, SLAM, Planeador de Trajectórias, Transformação Rígida.

vi CONTENTS

Abstract

In this dissertation we approach the problem of mapping and navigating a mobilerobot in indoor environ-

ments using artificial vision. A color-depth camera is a first option sensor. Using the depth information

is possible to infer the 3D location of selected points in the image (visual features). A set of at least three

3D points matched between consecutive images allow to estimate the motion of a camera considering

a rigid transformation (Procrustes problem). In addition to motion estimation, depth images allow also

observing the free space in front of the robot. When the robot moves thefree space information can be

integrated into an occupancy grid. The occupancy grid is then used to findpaths for navigation.

Alternatively to use a RGBD camera, we consider also the use of a simple RGB camera. The combi-

nation of several images obtained in time, using a SLAM methodology allows to recover 3D information.

In particular we consider a geometric model for the scene assuming a tubularstructure with a rectangular

section (corridor). The back-projection allows to pass points in the image plane in 2D to the camera

frame in 3D assuming the calibration known. Over the map built, a path planner is applied, resulting in

a trajectory from the current position to the desired one.

Experiments were conducted involving both methodologies in virtual reality environments. Compar-

ison with ground truth shows promising results.

Keywords: Camera Model, EKF, SLAM, Path Planner, Rigid Transformation.

viii CONTENTS

List of Figures

1.1 Typical sentinel, visiting places 1-8. . . . . . . . . . . . . . . . . . . . . . . . .. . . . 3

1.2 Basic Diagram of the proposed solution. . . . . . . . . . . . . . . . . . . . . .. . . . . 6

2.1 Perspective Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 8

2.2 Frontal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 8

2.3 The discrete Kalman filter cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 13

3.1 Diagram detailing the steps for the navigation. . . . . . . . . . . . . . . . . . . .. . . . 17

3.2 Various steps of the occupancy grid mapping process . . . . . . . . . . .. . . . . . . . 21

3.3 Profiles for the estimated motion of the camera (translation and orientation). .. . . . . . 22

3.4 Example of a Potential Field. Image taken from [24] . . . . . . . . . . . . . . .. . . . 22

3.5 Example of a Roadmap. Image taken from [20] . . . . . . . . . . . . . . . . . .. . . . 23

3.6 Example of an expanding tree. Image taken from [15] . . . . . . . . . . . .. . . . . . . 24

3.7 The RRT method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.8 The RRT-Connect method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 25

3.9 (a) Map 1 (b) Map 2 (c) Map 3 (d) Map 4. Red Circle is the starting point and the green

star is the goal point. All the maps solved with different methods. Second rowit’s the

Potential Fields method. Third row it’s the Probabilistic Roadmaps. Bottom row it’sfor

Rapidly exploring Random Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 26

4.1 Basic steps in mapping and motion estimation for the Color-depth vs Visual SLAM. . . . 30

4.2 Tubular structure used for testing the system behavior. . . . . . . . . . .. . . . . . . . . 31

4.3 Creating a mosaic with the interior texture of the tube. . . . . . . . . . . . . . . . .. . 33

4.4 (a) 3D Coordinates of pixels in the image inside the tube. (b) Relation between the angle

of the coordinate in the closed image and the vertical pixel in the mosaic. (c) Pixels in

the mosaic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.5 (a) VRML setup. (b) View of the camera inside the tube (c) Dewarping ofthe tube for a

single step (d) Result of dewarping several steps. . . . . . . . . . . . . . .. . . . . . . 34

4.6 (a) Real setup. (b) Inside view of the tube (c) Dewarping of the image for one step (d)

Result of dewarping several steps. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 34

x LIST OF FIGURES

4.7 Back-projection Model for Rectangular Section. . . . . . . . . . . . . . .. . . . . . . . 35

4.8 Results for the synthetic corridor. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 36

5.1 VRML model used for visualization. . . . . . . . . . . . . . . . . . . . . . . . . .. . . 39

5.2 Various consecutive steps of the POV of the robot inside the virtual world. . . . . . . . . 40

5.3 Map with robot ground truth locations (green crosses) and estimated locations (red cir-

cles) based on query and database images (a). The query images for thevarious steps (5,

81, 100 and 31) are on the left and the retrieved matching images on the right(b, c, d, e). 41

5.4 Pre-determined path used by the robot to go through the environment andmap it. . . . . 42

5.5 Color-depth camera used for motion estimation. . . . . . . . . . . . . . . . . . . .. . . 43

5.6 All the estimated positions for the camera, resulting from the motion estimation. . .. . . 44

5.7 All the steps to create a map used for navigation. . . . . . . . . . . . . . . . . .. . . . 45

5.8 POV of the robot when there is an obstacle ( 5.8b) and when there is none ( 5.8a). . . . . 46

5.9 All the missions the sentinel function created. One for each goal point. .. . . . . . . . . 48

5.10 All the missions the sentinel function created. There is one extra mission because of the

obstacle present in the environment. . . . . . . . . . . . . . . . . . . . . . . . . . .. . 50

5.11 Results for one corridor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 52

5.12 Map representation for one and two corridors. . . . . . . . . . . . . . .. . . . . . . . . 53

5.13 Result for the VSLAM experiments. . . . . . . . . . . . . . . . . . . . . . . . .. . . . 54

List of Tables

3.1 Average values for 10 runs . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 27

2 LIST OF TABLES

Chapter 1

Introduction

Video surveillance is continuously increasing its place within human environments. A wide range of

application scenarios is currently feasible because the level of maturity reached by the main enabling

technologies. In particular, is envisaged the development of robots capable of working as extensions

of human operators in the field, namely doing patrols, recognizing and tracking intruders, as well as

collaborating with other robots or even with humans using tele and videoconference.

Applications include monitoring and operations in hazardous or remote environments (ocean, space,

contaminated areas, areas destroyed by natural disasters, etc), in industrial or civil engineering structures

(pipeline monitoring and surveillance, bridges, dams), and in services (buildings, public areas, traffic

monitoring and surveillance). Surveillance robots must have high levels of autonomy. This implies sens-

ing and moving within the environment, working for long periods of time without human intervention,

and avoiding harmful situations to people, pets, property, or the robot itself.

The consumer market is growing in size and acceptance of this type of technology in the home. A

robotic home sentinel would take a tour of the house on demand when the owner is far away on vacations

or at work and take a picture of pre-specified locations. The idea is illustrated in Figure 1.1 where a

Wowwee Rovio [27] is used to visit a list of locations.

Figure 1.1: Typical sentinel, visiting places 1-8.

Recent vision sensors, namely the Microsoft Kinect [25] allow effectively measuring the empty space

4 Introduction

in front of a robot.The Kinect depth sensor is a structured light 3D scanner. Its infrared light source

projects light patterns to the space ahead. The reflected light is then received by an infrared sensor and

processed to extract geometric information about object surfaces and the distance to the cameras image

plane is computed. It captures both depth and color images simultaneously at aframe rate of up to 30

fps. The best part is that in comparison to the traditional laser scanners itcomes as a much cheaper

alternative. The Kinect has since attracted the attention of researchers and developers and is used in

indoor mapping, surveillance, robot-navigation and forensics applications, where accuracy requirements

are less strict. This MSc thesis is therefore concerned with using Kinect likesensors for freely navigating

in a cluttered home environment while reporting pre-specified scene-views.

Alternatively, we also explore the use of simpler conventional color cameras. Depth information

has in this case to be inferred in order to navigate the robot safely. It is anobjective of this thesis to

understand and assess advantages vs disadvantages for color vs color-and-depth cameras.

1.1 Related Work

As mentioned before, sensing and moving in an unknown cluttered environment is nowadays facilitated

by the recent introduction in the market of the MS Kinect camera. The Kinectcamera provides not only

visual information of the scenario, but also depth information. Some recentresearch works show that by

choosing the proper scene representations it is possible to do data acquisition, processing and integration

with a standard PC. One such reference is the work by Wurm et al [28],where a mobile robot maps a

number of campuses. Another reference is the work by Shen et al [19], where a flying robot (quadrotor)

maps a multi-floor scenario using a Kinect camera.

Considering ground plane wheeled robots, the depth information providedby the camera in particular

provides a horizontal stripe in front of the robot, and therefore indicates occupancy. This information

is usually integrated into occupancy grids. The principle idea of the occupancy grid [9] is to represent

a map of the environment as an evenly spaced field of binary random variables each representing the

presence or not of an obstacle at that location in the environment. Occupancy grid algorithms compute

approximate posterior estimates for these random variables.

Given the occupancy grid, one still has to find trajectories which can be followed by the robot. The

Robotics Toolbox [6] is a great software package developed by Peter Corke that addresses several of the

topics discussed in this work, and in particular trajectory planing.

Considering conventional color cameras, depth information is not immediately available. Inferring

depth can be done using tools firstly derived for range sensing. The Simultaneous Localization and Map-

ping (SLAM) methodology, firstly introduced by Randall Smith and Peter Cheeseman [21], and coined

by J. Leonard and H.F. Durrant-Whyte [8], addresses the challengingproblem of how to simultaneous

predict the location of a given autonomous unmanned vehicle and map its surrounding environment, us-

ing the on board sensors for data gathering, feeding its observation andmotion models at every instance.

Extensive work has already been done in terms of monocular SLAM. In [7] it is presented a real-

1.2 Problem Formulation 5

time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a

previously unknown scene. The core of the approach is the online creation of a sparse but persistent

map of natural landmarks within a probabilistic framework. In [4] it is presented a new parametrization

for point features within monocular simultaneous localization and mapping that permits efficient and

accurate representation of uncertainty during undelayed initialization and beyond, all within the standard

extended Kalman filter (EKF). For camera motion estimation some important references are [29] where

a kinematic model-based approach for the estimation of 3-D motion parameters from a sequence of noisy

stereo images is discussed. The approach is based on representing the constant acceleration translational

motion and constant precession rotational motion in the form of a bilinear state-space model using stan-

dard rectilinear states for translation and quaternions for rotation. Also, [2] and [1] where the problem

considered involves the use of a sequence of noisy monocular images of athree-dimensional moving

object to estimate both its structure and kinematics. The object is assumed to be rigid, and its motion

is assumed to be smooth. A set of object match points is assumed to be available, consisting of fixed

features on the object, the image plane coordinates of which have been extracted from successive images

in the sequence.

This work started as the continuation of the work developed by João Tomaz [23] in his MSc thesis

that followed a SLAM approach. In his work he estimated the motion of a camerainside a tubular

structure. He did the reconstruction of a tubular shape by selecting and registering distinctive visual

features along the tubular shape. For that he also assumes the a priori knowledge of a tubular model for

the environment structure.

1.2 Problem Formulation

Using the sensor available we have to be able to characterize the surroundings in every step. The ultimate

goal is to map the environment and plan the path the robot takes to reach its goal using first a kinect-like

camera (RGB+D) and then a normal (RGB) camera. It uses only visual odometry and has to detect and

bypass obstacles, in the navigation part, that may present in its way.

1.3 Proposed Solution

The solution proposed is based in finding and tracking visual (SIFT) features while the robot moves. The

depth information transforms the 2D points into 3D, which can be used directly toestimate the camera

motion (rigid transformation, rotationR and translationt) between consecutive images.

Using the camera motion(R, t) and the depth information (horizontal scan line from the depth image)

allows building an occupancy grid map. Once the map of the environment is available, one computes

the trajectory between start and goal points using a path planner as Potential Fields, PRM or RRT [6].

If an obstacle interferes with the path chosen the robot stops and computesa new path considering the

6 Introduction

Mapping:

Occupancy grids &

Favorite locations

Sentinel:

Navigate within the

mapped areas

Figure 1.2: Basic Diagram of the proposed solution.

obstacle found. If it reaches the goal it then starts to plan the path to the next goal, if there is one. Figure

1.2 summarizes the proposed solution.

Alternatively, we want also to assess the use of color cameras. In orderto infer depth information,

we consider a SLAM methodology. Constraints on the scene structure should be explored for the design

of improved filtering of the raw data obtained by registering features (SIFT) along image sequences.

1.4 Thesis Structure and Contributions

Chapter 1 introduces the problem to approach in the dissertation, in particular presents a short discussion

of the state of the art. Chapter 2 presents the theoretical background in camera model, perspective

projection and Kalman filtering. Chapter 3 describes how the mapping and navigation using a color-

depth camera is made. Chapter 4 describes how the mapping and navigation using a monocular camera

is considered. Chapter 5 shows the results of some experiments done to validate the ideas presented.

Chapter 6 summarizes the work performed and highlights the main achievements inthis work. Moreover,

this chapter proposes further work to extend the activities described in thisdocument.

The main contribution of this thesis is the presentation of a new approach for the vSLAMmethodol-

ogy under the assumption that the camera navigates within aTubular Shaped Structurewith a rectangular

section simulating a corridor-like environment.

The work described hereafter was partially published in [17].

1.5 Acknowledgments

This work was developed under the projectPOETICON++ - Robots Need Language: A Computational

Mechanism for Generalization and Generation of New Behaviors in Robotsin which VisLab is involved.

Chapter 2

Background

This chapter provides an introduction to the theoretical background concepts that will be used in the

thesis. Two main aspects are introduced here: the filtering of noisy data based in Kalman filtering and

the camera model that gives structure to the measurements equation of the filtering process. In the

following, first we detail the camera model and then the filtering.

2.1 Camera Model

The principles of perspective were discovered during the Italian Renaissance by da Vinci and other

architects and painters. They found how we see the 3D world. Cameras project 3D points into the

image plane. This transformation is not invertible unless we have additional information about the point

location in space (for instance, its depth). Cameras perform a projection of points in space onto the image

plane described by a perspective projection

m = ΠM (2.1)

where M, m are the homogeneous coordinates of the points in space and in the image plane andΠ is a

3× 4 matrix denoted as camera matrix.

2.1.1 Perspective Projection

Cameras concentrate electromagnetic radiation (light rays) on the surfaceof their sensors using lenses.

However this whole process can be approximated by the pin hole model [12]which simplifies and re-

duces the lenses to a small hole and all the light rays arriving pass throughthat hole, called the Optical

Center (O). Each point P in 3D space is projected into a point p obtained by intersectingthe PO straight

line with the image plane. The optical axis of the camera is a straight line orthogonal to the image plane

which contains the optical center and their intersection is called the principal point of the image.

In Figure 2.1 we can see two frames. One for the 3D points centered in the Optical Center and

8 Background

Figure 2.1: Perspective Projection

another for the 2D points (pixels) centered in the principal point of the image. The coordinates for point

P are (X,Y,Z) and for its projected point p are (x,y). Using the similarities of triangles we get

x = −f XZ

y = −f YZ

(2.2)

and f is the focal length of the camera. All these variables are measured in meters. For simplification

purposes we can use the frontal model of the pin hole and get Figure 2.2 where

Figure 2.2: Frontal Model

x = f XZ

y = f YZ

(2.3)

All these equations became simpler if we represent the two points P, p using homogeneous coordinates:

P = α

X

Y

Z

1

, p = β

x

y

1

(2.4)

whereα andβ are arbitrary constants. The normalized camera model (f = 1) in homogeneous coordi-

2.1 Camera Model 9

nates is written as

Z

x

y

1

=

1 0 0 0

0 1 0 0

0 0 1 0

X

Y

Z

1

(2.5)

2.1.2 Extrinsic Parameters

The coordinates of a point in space might not be known relative to its opticalcenter if this is hidden

inside the camera. The coordinates of a point in another frame with arbitrarylocation and orientationX0

are related to the camera coordinates by a rigid body transformation (rotation+translation ) by

X = RX0 + T (2.6)

whereR(3x3) is a rotation matrix andT(3x1) is a vector of translation. These parameters (R,T) are called

the extrinsics parameters of the camera since they depend on the camera pose. It can be written using

homogeneous coordinates:

X

Y

Z

1

=

[R T

0 1

]

X0

Y0

Z0

1

(2.7)

2.1.3 Intrinsic Parameters

The coordinates of a point in an image are measured in pixels and the frame is not usually centered at

the principal of the camera. The relation between the homogeneous coordinates of the image points in

meters and in pixels is showed by

K =

fsx fsθ ox

0 fsy oy

0 0 1

x′ = Kx

(2.8)

where f is the focal length,sx andsy are the conversion factors from meters to pixels in their respective

axis andox,oy are the coordinates of the principal point in the new frame. Also,sθ is a diagonal scale

(skew) factor. These parameters are denoted as intrinsic parameters ofthe camera and they are indepen-

dent of the camera pose. MatrixK(3x3) is called the matrix of intrinsic parameters. The camera model

with intrinsic and extrinsic parameters is given by

λx′ = K[R T ]X0 (2.9)

10 Background

λx′ = ΠX0 (2.10)

andΠ = K[RT ].The camera matrixΠ can be written as

Π =

πT1

πT2

πT3

(2.11)

whereΠTi is the i-th line ofΠ. Using the informationλ = πT

3 X0 we get

x′ =πT

1X0

πT

3X0

y′ =πT

2X0

πT

3X0

(2.12)

2.1.4 Depth Image to 3D Points

Cameras project points in 3D to the image plane. The spatial information of the points is lost in this

projection unless some additional information is available. Depth cameras givethat additional infor-

mation making it possible to locate in space the position of every pixel in its image. The combination

of the depth value, the camera intrinsics and extrinsics can be used to reconsctruct a point cloud of the

scene. Cartesian X, Y, Z points are represented by the camera as Z(u,v)where u, v are image points i.e.

perspective projection (projective) coordinates. The following equation 2.13 relates the 3D position of a

point with its projection in the image plane:

λ

u

v

1

=

αu 0 u0

0 αv v0

0 0 1

f 0 0 0

0 f 0 0

0 0 1 0

Xc

Y c

Zc

1

(2.13)

that is the same as, by combining the first two matrices on the right side of the equation

λu

λv

λ

=

αuf 0 u0 0

0 αvf v0 0

0 0 1 0

Xc

Y c

Zc

1

(2.14)

If we makeλ = d whered = Z(u, v) or the depth value for the pixel in image coordinates(u, v):

du = αufX + u0Z

dv = αvfY + v0Z

d = Z

(2.15)

2.2 Integration of measurements 11

If we put X,Y and Z in evidence, we arrive at

X(u, v) = u−u0

αufZ(u, v)

Y (u, v) = v−v0αvf

Z(u, v)

Z(u, v) = d(u, v)

(2.16)

where we get our 3D coordinates of the pixel at(u, v).

2.1.5 Back-projection

The 3-D ray that starts in the camera optical centerC and pierces the image plane in the point projected

eventually reaches the 3-D point. The camera centerC is defined as the intersection of the three planes

defined in the projection matrixP columns, wherePC = 0. If we redefineP asP =[p(123)|p4

]where

p(123) is a 3 × 3 matrix representing the first three columns ofP andp4 as the last column. We can

compute the camera center as

PC = 0 ⇔ P

[C

1

]= 0 ⇔ p(123)C+ p4 = 0 ⇔ C = −p(123)

−1p4. (2.17)

D is the 2-D point back-projection in 3-D coordinates, defined by the ray leaving the camera center and

intersecting the plane at infinity, and its representation becomes

x = PD ⇔ P

[D

0

]= x ⇔ p(123)D = x ⇔ D = p(123)

−1x (2.18)

wherex is in homogeneous coordinates.C andD determine a line in 3-D space where all the points in

that line is given by

X = C+ αD (2.19)

whereα is a scaling factor that∈ [−∞,+∞]. The following equation is the full expression for repre-

senting a pointx back-projection into toX

X = −p(123)−1p4 + αp(123)

−1x. (2.20)

2.2 Integration of measurements

A camera provides many measurements at each time stamp. In this thesis, color information allows

registering features of the scenario, while the geometric location of the features allows inferring pose

properties of the camera. Camera data is naturally corrupted by noise, bothradiometric and geometric.

Kalman filtering is a methodology to integrate camera measurements along time, such that the random

nature of the noise is progressively attenuated.

12 Background

In 1960, R.E. Kalman published his famous paper [13] proposing a recursive solution to the discrete-

data linear filtering problem. The Kalman filter is a set of mathematical equations that provides an

efficient computational (recursive) solution of the least-squares method. The filter estimates optimal

states over time, given observations in the presence of noise of a dynamic system driven by noise inputs.

This estimation is computed recursively as a Markov chain model, i.e. it only requires the previous

estimatek − 1 of a given state at instancek. The optimality comes from minimizing the estimated

covariance error. So the general problem of trying to estimate the statex ∈ ℜn of a discrete-time

controlled process that is governed by the linear stochastic difference equation

xk+1 = Akxk +Buk + wk, (2.21)

with a sensor measurementz ∈ ℜm that is

zk = Hkxk + vk. (2.22)

The random variableswk andvk represent respectively the process and measurement noise. They are

assumed to be independent of each other, white and with a normal probabilitydistributions

p(w) ∼ N (0, Q),

p(v) ∼ N (0, R).(2.23)

Then × n matrix A in equation 2.21 relates the state at instancek to the state at instancek + 1, in the

absence of either an input or process noise. Then × l matrix B relates the control inputu ∈ ℜl to the

statex. Them × n matrix H in the measurement equation 2.22 associates the state to the measurement

zk.

2.2.1 The Discrete Kalman Filter Algorithm

The Kalman filter estimates a process by using a form of feedback control: itestimates the process state

at some time and then obtains feedback in the form of noisy measurements. So the equations for the filter

can be categorized in two groups:time updateandmeasurement updateequations. The time update are

responsible for projecting forward in time the current state and error covariance estimates to get thea

priori estimates for the next time step. The measurement update are in charge of incorporating the new

measurements into thea priori estimates to obtain an improveda posterioriestimate. Figure 2.3 shows

the basic function of the filter.

The specific equations for the prediction step in the discrete Kalman filter are

xk+1 = Akxk +Buk (2.24)

Pk+1 = AkPkATk +Qk (2.25)

2.2 Integration of measurements 13

Figure 2.3: The discrete Kalman filter cycle.

and for the update step are

Kk = PkHTk (HkPkH

Tk +Rk)

−1 (2.26)

xk = xk +K(zk −Hkxk) (2.27)

Pk = (I −KkHk)Pk (2.28)

In the time update equations 2.24 and 2.25 it is clear that the state and covariance estimates are

projected from instancek to k + 1. The bar over the symbol means it is ana priori estimate. For

the measurement updates it starts by computing the Kalman gainKk in 2.26, then uses the current

measurement to generate ana posterioristate estimate in 2.27 and also at the same time to obtain a new

error covariance estimate as in 2.28. This process is repeated after eachpredict-update cycle is complete

and the newa posterioriestimates are used as inputs to the new cycles predict step.

2.2.2 Extended Kalman Filter

The Extended Kalman Filter (EKF) allows handling non-linear systems functions and measurement mod-

els through linearization with a first order Taylor series expansion. Both Kalman Filter and its non-linear

extension can be described in two main steps with a standard set of equations, denominated respectively

as prediction and update, also known asa priori anda posterioriestimates. Let the state vectorsk be

represented as a Gaussian distribution process, withxk mean value andPk as the uncertainty of the

estimates covariance matrix, i.e.sk ∼ N (xk,Pk), with initial state estimate also normally distributed

s0 ∼ N (x0,P0).

EKF Prediction Step The mean valuexk and the covariance matrixPk predicted transition from an

instantk to k + 1 is accomplished trough the following set of equations:

xk+1 = f(xk,uk, ǫk) (2.29)

Pk+1 = FkPkFTk +GkQkG

Tk (2.30)

14 Background

The system dynamics non-linear model functionf(xk,uk, ǫk) depends on the system input controluk,

and input transition noiseǫk ∼ N (0,Qk) with covariance matrixQk. Fk andGk stand for for the state

transition Jacobian matrix and input transition noise Jacobian matrix, linearizednear the given points of

the estimatexk+1

EKF Update Step Whenever a measurementzk+1 becomes available after the prediction step, one

applies the update procedure:

Sk+1 = Hk+1Pk+1HTk+1 + Jk+1Rk+1J

Tk+1 (2.31)

Kk+1 = Pk+1HTk+1S

−1k+1 (2.32)

xk+1 = xk+1 +Kk+1(zk+1 − hk+1(xk+1, δk+1)) (2.33)

Pk+1 = (I−Kk+1Hk+1)Pk+1 (2.34)

wherehk+1(xk+1, δk+1) is the observation model function, producer of sensor measurements estimates,

with Hk+1 as the observation model Jacobian matrix, linearized around the estimatexk+1. Rk+1 defines

the measurement model noise covariance matrix, from the Gaussian noiseδk ∼ N (0,Rk). The inno-

vation processνk+1 = zk+1 − hk+1(xk+1, δk+1) which measures the error between measurements and

predicted ones, assumes a normal distributionγk ∼ N (νk,Sk+1), whereSk is the innovation covariance

matrix. Much alike the characteristics ofGk, Jk+1 is the measurement noise Jacobian matrix, linearized

around the state predicted estimates. Finally, the Kalman GainKk+1 weights the predicted state with the

newly observed measurements.

2.2.3 Other Types of Filters

There are other types of data filtering techniques used in robotics other than the previously mentioned

Kalman filter. Particle filters [22] are a great way to track the state of a dynamicsystem for which you

have a Bayesian model. That means that if you have a model of how the system changes in time, possibly

in response to inputs, and a model of what observations you should see inparticular states, you can use

particle filters to track your belief state.

Particle Filters

The particle filter (PF) was introduced in 1993 as a numerical approximation tothe nonlinear Bayesian

filtering problem, and there is today a rather mature theory as well as a numberof successful applica-

tions described in literature. A particle filter is a set of on-line posterior density estimation algorithms

that estimate the posterior density of the state-space by directly implementing the Bayesian recursion

equations.

2.3 Application to the thesis context 15

Its objective is to estimate the posterior density of the state variables given the observation variables.

The particle filter is designed for a hidden Markov Model, where the systemconsists of hidden and

observable variables. The observable variables (observation process) are related to the hidden variables

(state-process) by some functional form that is known. Similarly the dynamical system describing the

evolution of the state variables is also known probabilistically. A generic particle filter estimates the

posterior distribution of the hidden states using the observation measurementprocess.

Particle methods assume thatxk and the observationsyk can be modeled like this:

xk|xk−1 ∼ pxk|xk−1(x|xk−1) wherex0,x1,... is a first order Markov process and has an initial distribution

p(x0) and the observationsyk|xk ∼ py|x(y|xk) are conditionally independent provided thatx0,x1,... are

known, meaning thatyk only depends onxk. A generic example is

xk = g(xk−1) + wk (2.35)

yk = h(xk) + vk (2.36)

wherewk andvk are mutually independent and identically distributed sequences with probabilitydensity

functions known andg andh also known functions. They look very similar to the state space equations

of the Kalman filter except the functionsg andh are linear for the discrete Kalman filter andwk andvkare gaussian. Kalman filter based methods are a first-order approximation (EKF) where Particle filters

are also an approximation, but with enough particles they can be much more accurate.

2.3 Application to the thesis context

In the following we describe a simple map building and navigation methodology based on occupancy

grids and trajectory planing. The camera is assumed to be a color-depth camera, as the Microsoft Kinect.

More details in the next chapter.

After developing a basic navigation methodology, we study and develop a methodology to use of

a simpler camera, namely a standard color camera, hence without depth measurements. In order to

compensate the lack of depth measurements we use an EKF based approach, more precisely Mono-

SLAM [7]. The dynamic motion model is a 6D constant velocity model. In particular we assume an

hypothesis on the scene structure to be nearly a tubular shape. This hypothesis allows constraining more

the data, and therefore more effective filtering, and in addition to keep a constant state size. More details

in chapter 4.

16 Background

Chapter 3

Navigation using Color-Depth Cameras

Navigation is the process of monitoring and controlling the movement of the robot from one location to

another. In other words, this means that the robot needs to be able to locatehimself compared to other

known locations (for instance, its last location). In this chapter we will detailmore on how the mapping

and navigation using a color-depth camera is dealt. A diagram with the basic steps of the implemented

navigation process is produced next in Figure 3.1:

Figure 3.1: Diagram detailing the steps for the navigation.

18 Navigation using Color-Depth Cameras

3.1 Occupancy Grids

The principle idea of the occupancy grid [9] is to represent a map of the environment as an evenly spaced

field of binary random variables each representing the presence or not of an obstacle at that location in

the environment. Occupancy grid algorithms compute approximate posterior estimates for these random

variables.

The goal of these algorithms is to estimate the posterior probability over maps given the data:

p(m|z(1:t), x(1:t)) wherem is the map,z(1:t) are the sensor readings from the beginning untiltime = t

andx(1:t) are the position and orientations of the robot in the same time steps.

The posterior of a map is approximated by factoring it intop(m|z(1:t), x(1:t)) =∏

i p(mi|z(1:t), x(1:t)).

Due to this factorization, a binary Bayes filter can be used to estimate the occupancy probability for each

grid cell.

In this section we use the depth and color information directly provided by the camera (a Microsoft-

Kinect like camera) to build an occupancy grid map.

3.1.1 Motion Estimation

We need to estimate where the camera is at all steps to be able to correctly updatethe occupancy grid.

For that we use the RGB images. We will now explain in a more detailed way how that estimation is

made.

How the camera motion estimation between steps is made

First the SIFT [16] features are computed in the current and previous RGB images. Then a matching

algorithm returns the corresponding features in both images. By displayingthese matched points over

the images (Figure 5.5a) we can see that there are still some wrong matchings or outliers. So we apply a

RANSAC filtering algorithm to the matchings using a rigid transformation model to discover the inliers

in the list of matchings. This is only possible because we have the depth information from the features

and can infer their 3D position and use the rigid transformation model. Using only those inliers we

input them in the Procrustes function in Matlab that returns the rigid transformation between the two

point clouds (one for each image). This way we improve a lot our motion estimation of the camera and

consequently the final map.

Steps to compute a rigid transformation

LetP = p1, p2, ..., pn andQ = q1, q2, ..., qn be two sets of corresponding points inRd. We wish to

find a rigid transformation that optimally aligns the two sets in the least squares sense:

3.1 Occupancy Grids 19

(R, t) = argminR,t

n∑

i=1

wi ‖(Rpi + t)− qi‖2 (3.1)

wherewi are weights assigned to each pair of points. We briefly explain the steps to compute the optimal

translation t and rotation R that minimize the euclidian distance between one point cloud and the other

transformed (Equation 3.1). First compute the weighted centroids of both point sets:

p =

∑ni=1wipi∑ni=1wi

, q =

∑ni=1wiqi∑ni=1wi

. (3.2)

Then, center the vectors of each cloud point:

xi := pi − p (3.3)

yi := qi − q , i = 1, 2, ..., n. (3.4)

Next, calculate thed× d covariance matrix

S = XWY T , (3.5)

whereX andY ared×nmatrices that have as their columnsxi andyi respectively andW = diag(w1, w2, ..., wn).

Now we compute the singular value decomposition ofS = UΣV T . We get the rotation

R = V

1

1.. .

1

det(V UT )

UT . (3.6)

Finally, we just need to get the translation

t = q−Rp. (3.7)

3.1.2 Map Building

To validate this approach an experiment was conducted and the results arepresented next. Using the data

from a run around a VRML world composed by the four corridors and elevator hall (simulating the 5th

floor IST North Tower), we have the projection matrices and depth maps forevery step. By choosing

the middle line in the depth image and assuming that the camera is horizontal the wholetime we can

simulate a laser range finder. In each step we have measurements for obstacles. These measurements are

in reference to the camera position in the world, so we have to take that into account and put the sensor


readings in world coordinates (change from the camera referential to theworld referential).Then we need

to use that information (obstacles and free space) and update our probability grid. Each cell in the grid

has a probability value for the belief that it is occupied. To update the free space we do a sampling of the

area to where the camera is watching and see what samples fall in the polygondefined by the obstacle

points and camera position in the world. The free cells are updated with the following equations

Bel_Occ = (1− TrueHit)× p(x, y) (3.8)

Bel_Unocc = (1− FalseAlarm)× (1− p(x, y)) (3.9)

p(x, y) =Bel_Occ

Bel_Occ+Bel_Unocc(3.10)

The values forBel_OccandBel_Unoccare the belief that the cell is occupied or not, respectively. Then

we normalize the probability and getp(x, y). The locations where obstacles were detected are also

updated with

Bel_Occ = TrueHit× p(x, y) (3.11)

Bel_Unocc = FalseAlarm× (1− p(x, y)) (3.12)

p(x, y) =Bel_Occ

Bel_Occ+Bel_Unocc(3.13)

The variablesTrueHit andFalseAlarmrepresent the probability that an obstacle is detected by the

sensor when it is actually there and when the sensor detects something that isnot there, respectively. We

do this for all the steps and in the end we have an occupancy grid map of the environment. Figure 3.2

shows various steps of the occupancy grid mapping process. This was done with the ground truth position

of every step of the camera known.

3.1.3 Loop Closure

The idea for the loop closure is to propagate backwards the error observed when we know where the

robot should be and where it actually is, for instance when we know it started and ended in the same

place but in the map built it did not. When the robot passes in a certain area that has visited before it

should be able to recognize it and correct its location, meaning that the map ideally should be a closed

loop if the robot performed an actual loop.

The adopted solution was: once the loop closure is detected (i.e. a past location is being visited

again) and using the projection matrices in every step it is possible to make a profile of the movement of

the camera inx, y, z axis for the translation andα,β,γ (the euler angles computed using the Rodrigues

[26] formula) for the orientation, i.e. we can see in Figure 3.3 how the cameraevolves in these reference

frames in all steps. Based on this evolution we need to divide our final error through all the steps. If we

compute new camera projection matrices with the error correction added and rebuild the occupancy grid,

3.2 Motion Planning 21

(a) Initial map (b) Step 1 (c) Step 12 (d) Step 25

(e) Step 50 (f) Step 75 (g) Step 100 (h) Final Map

Figure 3.2: Various steps of the occupancy grid mapping process

the final error should be smaller. So for each camera matrix we update its rotation matrix and translation

vector with the weighted error. The only things that we have to compute are essentially the weight of

the error added in each step and the error itself. The error is just the difference between your desired

and actual end pose. To find the weight of the error to add in each step weused the formula: (this is the

formula forx, but it is the same for the others)

ex(t) = exmax

(−1

2 + 1Mx

∫ t

0 |vx(t)|dt), t ∈ [0, tmax]

Mx ≡∫ tmax

0 |vx(t)|dt(3.14)

whereMx is a normalization constant andvx is the velocity at that instant.

3.2 Motion Planning

Motion planning is a term used in robotics for the process of breaking downa desired movement task

into discrete motions that satisfy movement constraints and possibly optimize some aspect of the move-

ment. In this section we detail some trajectory planning methodologies, namely the Potential Fields,

Probabilistic Roadmaps and Rapidly Exploring Random Trees.

3.2.1 Potential Fields

Potential Fieldsis a motion planning technique where the robot is treated as a particle placed onan

influencing potential field [11]. The topology of the field is created by the designer: it first creates

multiple behaviors, each assigned a particular task or function, then represents each of these behaviors

as a potential field and finally combines all of the behaviors to produce the robot motion by combining


20 40 60 80 100 120

0

10

20

# steps

Dis

tanc

e

xyz

20 40 60 80 100 120−4

−2

0

2

4

# steps

rads

alphabetagama

Figure 3.3: Profiles for the estimated motion of the camera (translation and orientation).

the referred fields (an example can be seen in Fig. 3.4). One can also thinkof it as a marble rolling down

some hills that represent the influence of all the behaviors present.

Figure 3.4: Example of a Potential Field. Image taken from [24]

The action vector, the speed and orientation of the robot, is produced by each behavior based on

its purpose. The behavior of the goal state is to attract the agent and of anobstacle is to repel it. The

collection of these vectors for all possible locations of the agent is the Potential Field.

Based on this Potential Field the robot knows how to move around. It moves at the speed and

orientation of the vector at that location.


Different Types of Potential Fields

There are more types of behaviors than just the goal and obstacle described above. It all depends on

the situation you want to create. Follow/Avoid a wall, Circle around something and Do Not Repeat Past

Locations are just a few of the types possible to recreate by choosing the right potential function to model

the wanted behavior. In this work, the avoid the past locations behavior was implemented to solve local

minima where the agent gets stuck. This is done by adding a repulsive trail wherever it passes.

3.2.2 Randomized Methods for Path Planning

Motion planning problems involve searching the system configuration spaceof one or more complicated

geometric bodies for a collision-free path that connects a given start andgoal configuration, while satis-

fying constraints imposed by complicated obstacles. Complete algorithms are known to solve this class

of problems but their computational complexity limits their use to low-dimensional configuration spaces.

This motivated the development of randomized methods. Although these methodsare incomplete they

will find a solution if given enough time. Randomized path planning algorithms were designed for one

of two contexts: single-query or multiple-query planning. For single-query planning, it is assumed that

a single path planning problem must be solved quickly, without any preprocessing. For multiple-query,

many path planning problems will be solved for the same environment and is worthwhile to preprocess

information and store it in a data structure that allows fast path planning queries. In the coming sections

a brief explanation of two randomized methods is presented.

3.2.3 Probabilistic Roadmaps

The Probabilistic Roadmaps algorithm work in two steps, the first calledlearning stageand the second

calledquery stage[3]. Given an instance of the RMP (Robot Mapping Problem) in thelearning stage

the algorithm samples theconfiguration spaceand builds an undirected graphG = (V,E) which captures

the information gathered. The graph is calledProbabilistic Roadmap. An example of that is depicted

in Fig. 3.5. In thequery stage, the PRM is used to solve specific RMP problem instances which are then

reduced to a graph search.

Figure 3.5: Example of a Roadmap. Image taken from [20]


The learning stage starts by generating a random configurationc ∈ Cfree. Cfree is the set of con-

figurations for which the robot does not collide with any static obstacle. It checks for nearby vertices

where its distance is lower than a certain threshold. Then it connects them bythe increasing order of

their distance and the planner verifies that the path between them is free.

In the query stage two configurations are given (startandgoal). They both are connected to the graph

by the same ascending distance method. They are checked if both belong to the same connected graph

component. If so, the path is returned. Possibly, path smoothing is later applied for a better solution.

Notes on PRM algorithm

The learning stage takes much more time than the query stage therefore is only worth if multiple queries

are made. If the robot is to move in a dynamic environment, the PRM will be no morevalid as obstacles

move, thus vanishing the time spent to build it. It has difficulties in finding narrowpassages in the

configuration spaceC.

3.2.4 Rapidly Exploring Random Trees

The RRT Method

The RRT is an efficient data structure and sampling scheme to quickly searchhigh-dimensional spaces

that have both algebraic and differential constraints [14]. The basic idea is to direct the search in the

configuration space for a solution.

Figure 3.6: Example of an expanding tree. Image taken from [15]

It starts by building a tree rooted at the starting pointqinit. A random configurationqrand is chosen

in Cfree. The current tree is extended from the closest vertex by a fixed step size in the direction of that

configuration. Every time the tree is extended a function checks for collisions between the closest vertex

and the new vertex (Figure 3.7). There are three possible outcomes for this tree extension: the path is

clear and the tree is extended in that direction with the new vertex; the path is obstructed by an obstacle

and the new vertex is rejected; the random configuration is reached and added to the tree because it is at a

smaller distance than the step fixed size. This process is repeated until the goal configuration is reached.

Some things to keep in mind is that for this method an efficient neighbor search isimportant. The metric


Figure 3.7: The RRT method

chosen is also fundamental.

RRT-Connect Path Planner

A similar variant of the RRT method is the RRT-Connect. The RRT-Connect planner is designed specif-

ically for path planning problems that involve no differential constraints. The method is based on two

ideas: the Connect heuristic that attempts to move over a longer distance, andthe growth of two RRTs

from bothqinit andqgoal.

Instead of attempting to extend an RRT by a single step size, the Connect heuristic iterates the extend

step untilq or an obstacle is reached (Figure 3.8). With this method, the greedy heuristicis combined

with the rapid and uniform exploration properties of RRTs, which seems to avoid the local minima.

The idea is: two trees are maintained at all times until they become connected anda solution is found.

In each iteration, one tree is extended and an attempt is made to connect the nearest vertex of the other

tree to the new vertex. Then, the roles are reversed by swapping the two trees. This causes both trees to

explore the free space around them, while trying to establish a connection between them.

Figure 3.8: The RRT-Connect method

The key result is that the distribution of the RRT vertices converges towardthe sampling distribution,

which is uniform.

3.2.5 Comparison of Trajectory Planning Methods

In order to obtain a fair comparison of the planning methods an experiment asbeen conducted. The

results were obtained after 10 runs for each method and map under the samerandom conditions (randinit


for the same method in different maps). The maps used are in Figure 3.9. Thestarting point was(20, 10)

and goal point(50, 30) for the first map. For all the others it was chosen thestart = (10, 10) and

goal = (90, 90) points.

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 1000

10

20

30

40

50

60

70

80

90

100

(a) (b) (c) (d)

Figure 3.9: (a) Map 1 (b) Map 2 (c) Map 3 (d) Map 4. Red Circle is the starting point and the green star isthe goal point. All the maps solved with different methods. Second row it’s thePotential Fields method.Third row it’s the Probabilistic Roadmaps. Bottom row it’s for Rapidly exploringRandom Trees.

The table with the results is in table 3.1. Looking at the results we observe that the Potential Fields

(PF) method is on average the fastest and the one that covers the least amount of distance (except for

the first map). The Rapidly Exploring Random Trees usually covers a bigger distance but has a time

scale comparable to the PF method. The Probabilistic Roadmap takes the most time to compute and has

a distance somewhere in between the previous two methods. However it’s success rate drops forhard

maps where it’s critical to find a narrow passage or there are several obstacles.


MAP 1 DISTANCE TIME [s] % SUCCESSPotential Fields 1414 4,2 100RRT 328,3 4,8 100PRM 280 18,68 50MAP 2 DISTANCE TIME [s] % SUCCESSPotential Fields 80 0,18 100RRT 143.99 0,58 100PRM 105,6 34,35 100MAP 3 DISTANCE TIME [s] % SUCCESSPotential Fields 121 0,23 100RRT 161,74 0,3595 100PRM 135,5 38,36 100MAP 4 DISTANCE TIME [s] % SUCCESSPotential Fields 108 0,34 100RRT 161,63 4,9 100PRM 137 15,1 10

Table 3.1: Average values for 10 runs

The first map was consideredvery hardbecause it has a lot of local minima solutions since there are

various walls needed to circumvent. For that reason the Potential Fields methodhad to take various turns

increasing a lot the distance to travel. The PRM method only had a 50% success rate on this map.

The second map was consideredeasysince the obstacles do not get in the way of the direct path

between the start and goal. All the methods were successful.

The third map rankedmediumdifficulty. The obstacle present made it mandatory to go around it. All

the methods were successful.

The fourth map was gradedhard because of the narrow passage the path had to find. The PF and

RRT methods were successful but the PRM had only a 10% rate of completion.

Based on the observation of the results, the best method depends on the type/difficulty of the map

and on the location of the start and goal points. Foreasyandmediummaps, all the methods are well

suited to solve them. Forhard ones the PF and RRT seem to yield more robust results. Forvery hard

type of maps, the RRT and PRM (although the success rate of PRM was not 100%) appear to do a better

job in these cases taking into account the distance.

Chapter 4

Navigation based in Monocular Vision

This chapter addresses aSimultaneous Mapping and LocalizationSLAM [8] methodology for a system

capable of performing visual inspections in an unknown environment domain assumed to be a tubular

shaped structure (TSS), using a monocular camera [23]. Under theTSSassumption, a geometrical model

is developed, which directly maps visual features from planar images onto a3D representation, through

a backprojection procedure, thus enabling a scenario reconstruction composed of cylindrical segments.

We will also adapt the tubular shape to a rectangular shape. The EKF framework allows reconstructing

the path described by the camera and its visualized structure, based in visual landmarks described by

feature detectors.

Estimating the motion of a camera moving inside a shaped structure involves considering various as-

pects. Two important aspects are the number of degrees of freedom of the camera motion and the shape

of the environment structure. ATSSincludes straight and curved sections. The camera can move in 6D

(3D in translation and 3D in rotation). Motion can be estimated by registering the texture, retrieving dis-

tinctive visual features, and dewarping into a mosaic. We approached theproblem by first reconstructing

points of the scene and then focus on fitting a simple 3D cylindrical model to the various tube sections,

which makes simple the dewarping step [18].

4.1 Depth Information Not Directly Available

The big difference from this method to the one presented in Chapter 3 is the sensor used. In this method,

a simple monocular camera does not give the depth information from its pixels. Because of this a new

way of estimating the motion of the camera is necessary. The SLAM methodology is used with some

assumptions madea priori. A structural model for the environment is defined and an EKF with a motion

model and observation model work together to estimate the pose of the camera among other things.

The diagram in Figure 4.1 sums up the basic steps of mapping and allows to compare VSLAM with

the Color-Depth methodology.

30 Navigation based in Monocular Vision

Figure 4.1: Basic steps in mapping and motion estimation for the Color-depth vs Visual SLAM.

4.2 Iterative Estimation of Tubular Structure and Camera Motion 31

4.2 Iterative Estimation of Tubular Structure and Camera Motion

Assuming the world as aTSS(Fig. 4.2) domain navigated by a forward hand-held like monocular cam-

era, smoothly moving with a constant speed inside a tube without revisiting previous positions, one

can determine the features 3D locations, which are strapped to a cylindricalsection wall with radius

ρ. By thoroughly defining a state vectors composed of both cameras and theTSSgeometrical param-

eters estimates, and incorporating a geometrical observation model reliable enough to produce relevant

estimations of the cylindrical section geometrical parameters, the simultaneous camera localization and

mapping can be achieved.

0

0.5

1

1.5

2

2.5

3

3.5

4

−1.5

−1

−0.5

0

−0.4

−0.2

0

ZX

Y

Figure 4.2: Tubular structure used for testing the system behavior.

4.2.1 State Vector

The state vectors is a joint composition of camera posex and cylinders geometrical parametersy :

sk = [xk ; yk] (4.1)

where the semicolon denotes vertical stacking of vectors.

Camera motion behavior is modeled with a constant velocity dynamic model. The statevectorxk

provides the positionrWCk , orientationqWC

k and both linearvWk and angularωC

k velocities at every

instant, i.e.

xk =[rWCk ; qWC

k ; vWk ; ωC

k

]. (4.2)

A cylindrical section is characterized by a 7 dimensionynk state vector as an array of the cylinder

parameters centre positionpnWk , orientationonk, and radiuslog(ρnk) expressed in a logarithm form

ynk =[pn

Wk ; onk ; log(ρnk)

]. (4.3)


4.2.2 System Dynamics

The non-linear state transition model functionf , defined as the compilation of two independent state

transition processes,fx andfy, for both camera and cylinder state vectorx andy, which are influenced

by an additive zero-mean Gaussian transition noiseǫ ∼ N (0,Qk):

f = [fx ; fy] . (4.4)

The constant velocity model allows smooth velocity variations with zero-mean Gaussian distributed

acceleration noisenx ∼ N (0,Nxk) = (aWϑC)T . Under this model assumption, and definingVW =

aW∆t andΩC = ϑC∆t as the linear and angular velocities impulse between a transition step∆t, one

can express the camera state transition,xk+1 = fx(xk,nxk) [1, 4]:

rWCk+1

qWCk+1

vWk+1

ωCω+1

=

rWCk +

(vWk +VW

)∆t

qWCk × q

((ωCk +ΩC

)∆t

)

vWk + V W

ωCk +ΩC

. (4.5)

4.2.3 Observation Model

The observation model describes the process of implicitly representing observed featuresΛ in a 3-D pa-

rameterization as a function of the systems states parameters by inferring constraints to the environment

structure and consequently to the features 3-D locations. At an instantk the state transition functionf is

applied tosk, retrievingsk+1 = f (sk,nk). Λk is the input set of features in pixel coordinates acquired

at the instantk , whereasΛk+1 is the set of features in following instantk+ 1 and the observation model

output. The observation model can thus be written as a functionh:

Λk+1 = hk+1(sk+1, Λk) (4.6)

Consider that the set of featuresΛk, 3-D locations on theTSSsurface, are the intersections of rays

leaving the camera centrerWCk with the cylinder wall, piercing the 2-D image plane where all features lie.

With the information present in the state vectorsk, i.e. camera posexk and cylinders section parameters

yk, one can estimate the 3-D location of every feature in setΛk through a back-projection methodology,

and compute the re-projection of this 3-D coordinates with the next instantk + 1 state vectorsk+1 to a

2-D image plane, thus acquiringΛk+1 features.

4.3 Dewarping

The process of opening the tube (Figure 4.3) or dewarping the interior ofthe tube is done by finding

the transformation between the pixels in the scene viewed by the camera inside the TSSand the planar

4.3 Dewarping 33

mosaic we want to build of the inside texture. In essence is a conversion from polar coordinates to

cartesian coordinates. Figure 4.4 shows that relation. We first find the 3Dposition of each pixel in

relation to the cameras center. After deciding how many samples we want, i.e. how many lines of the

mosaic in each step, and the depth we are able to see we have the size of the new image. Then the

position of each new pixel(u, v) is a function of its 3D relation to the camera optical center. The depth

(Z coordinate) will influence the horizontal location in the mosaic. The angleα (which is computed

based on the X and Y coordinates) influence the vertical location of the pixel in the mosaic. By doing

this, we defined a mapping function where we give polar coordinates to each pixel in the image and using

those we apply a transformation and get the corresponding pixel location inthe new image, the planar

mosaic. A final mosaic is stitched using an homography showing the full inside texture of the tube. Some

examples of dewarped images can be seen in Fig. 4.5 and Fig. 4.6.

Figure 4.3: Creating a mosaic with the interior texture of the tube.

(a) (b) (c)

Figure 4.4: (a) 3D Coordinates of pixels in the image inside the tube. (b) Relation between the angle ofthe coordinate in the closed image and the vertical pixel in the mosaic. (c) Pixelsin the mosaic.

4.3.1 Dewarping a VRML Tubular Shape

Datasets with real images, acquired from virtual or real cameras, introduce image processing frame-

works to the developed system, incorporating image description techniques such as features descriptor

algorithms (e.g. SIFT, SURF), enhanced by the RANSAC methodology for outliers removal. Figure 4.5


shows various steps of the process.

4.3.2 Dewarping a Real Tubular Shape

Another experiment consisted on evaluating the developed algorithm with a set of images retrieved from

a real camera navigation inside a texturedTSS. The setup and the results can be seen in Figure 4.6.

(a) (b) (c) (d)

Figure 4.5: (a) VRML setup. (b) View of the camera inside the tube (c) Dewarping of the tube for asingle step (d) Result of dewarping several steps.

(a) (b) (c) (d)

Figure 4.6: (a) Real setup. (b) Inside view of the tube (c) Dewarping ofthe image for one step (d) Resultof dewarping several steps.

4.4 Rectangular Section for a Corridor

To adapt this model from a tubular shape to our corridor model we had to make some changes. The idea

was to switch the back-projection function to a rectangular section instead ofa circular one and the state

vector now assumes the radius of the cylinder as the aspect ratio of the rectangular section. The concept

is the same as before: find the 3D ray that starts in the camera optical centerC and pierces the image

plane in the 2D projection ofD and then using the known structure of the environment compute the

scaling factorα that represents the intersection between the 3D line and the corridor planes(Fig. 4.7).

M = C+ αD (4.7)

4.4 Rectangular Section for a Corridor 35

D

z

Figure 4.7: Back-projection Model for Rectangular Section.

4.4.1 Results obtained for the synthetic case

In this setup, a synthetic corridor was simulated by creating a point cloud of features (Fig. 4.8a) that

consisted in a rectangular section repeated along the zz axis. The camera was then moved along that axis

starting at the origin until the point(0, 0, 5) was reached and at each step it captured the projection of

the features in its image plane. This was the dataset used in the simulation. In totalthere were 50 steps

with a step size of 0.1m. In the starting configuration file, the initial assumption of the camera position

was purposely set wrong(0.5, 0, 0) to see if the filter would correct the state. The resulting plot of the

camera pose in world coordinates exhibits a clear correction by the filter (Figure 4.8b and Figure 4.8c).

The plots in Figure 4.8d and Figure 4.8e show in more detail the evolution of the state of the filter in

camera position and orientation, respectively.


−5

0

5

−5

0

5

0

5

10

x

z

y

(a) Synthetic Cloud Point.

−3−2

−10

12

3

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0

2

4

6

x

z

y

(b) Synth section with ground truth (green): Perspectiveview.

−3 −2 −1 0 1 2 30

1

2

3

4

5

6

7

x

z

(c) Synth section with ground truth (green): Top view.

5 10 15 20 25 30 35 40 45

0

5

10

# steps

Dis

tanc

e

xyz

(d) State of the filter for the position of the camera.

0 10 20 30 40 50−0.4

−0.35

−0.3

−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

# steps

rads

alphabetagama

(e) State of the filter for the orientation of the camera(euler angles).

Figure 4.8: Results for the synthetic corridor.

4.4.2 Conclusion

This chapter aimed at presenting a robust model to solve theSimultaneous Localization and Mapping

problem with a priori structural environment knowledge, assumed to be aTSSfirst and a rectangular

tube secondly. The cameras pose estimate and the camera trajectory can be recovered, up to a scale

4.4 Rectangular Section for a Corridor 37

factor, assuming the constant speed dynamic model, allowing a posterior description of the environment

structure and its reconstruction.

Chapter 5

Experiments

In this chapter a number of experiments are conducted to validate the mapping and navigation methods

studied and introduced in previous chapters. This chapter describes theexperiments and the results

obtained from them.

5.1 The Experimental Setup

A virtual world created in VRML (Virtual Reality Modeling Language [5]) seen in Figure 5.1 is used for

purposes of providing the imaging data that a robot can acquire. An example of a point of view (POV)

for the robot inside the virtual world in various consecutive steps can beseen in Figure 5.2. Several 3D

points coinciding with the VRML corners make up the facets used to simulate the depth sensor. The

virtual world has textures in its walls that we will use to scan for visual features.

(a) VRML world wireframe (b) VRML world wih textures

Figure 5.1: VRML model used for visualization.

40 Experiments

Step 1 Step 2 Step 3



Figure 5.2: Various consecutive steps of the POV of the robot inside the virtual world.

5.2 Single Image Global Localization

Usually the first mission of a robot in an unknown scenario is an explorationrun to map the environment.

During that run the robot periodically, after a certain number of steps, saves in its memory the SIFT

descriptors and map location from the image captured. The objective is that ifwe randomly choose an

RGB image from any past location we can compute its SIFT descriptors and compare with all the other

SIFT descriptors in the database. It can be a direct match, because the step we choose to locate was one

that coincidentally was saved or we choose the one with the bestscorefrom the matching function that

should yield the best option to locate the robot. Essentially it compares images and finds the most similar

one. There are some problems with this approach like the textures (some textures are not good for the

SIFT method) and redundancy (repeated or similar images in different places) are the main ones.

In Figure 5.3 we can see examples for all the possible outcomes. The correct correspondences, the

exact correspondences and the incorrect correspondences. InFigure 5.3 there is a map of the environ-

ment. The points represent the location where the RGBD images were taken. The yellow points represent

where the SIFT features were saved. In this case they were saved forevery 3 steps. The blue points rep-

resent the points where the robot did not record any information. The green cross is the actual location

of the image we are trying to locate and the red circle is the location of the strongest match returned by

5.2 Single Image Global Localization 41

the algorithm.

The first case (step 5) is a correct one because the strongest match is the closest image (Figure 5.3b)

with recorded features as we can see in the map. The second case (step 81) is also a correct one for the

same reasons (Figure 5.3c). Looking at the map the green cross and redcircle are very close. The third

case (step 100) is an exact match. The image we are trying to locate (Figure 5.3d) is one that is also

stored in memory so its SIFT features are exactly the same and we have the correct location of it. The

fourth and final case (step 32) is a mismatch. From the RGB image (Figure 5.3e) we can speculate that

the reason for the mismatch is the repeated texture that exists in several locations in the environment. The

input image (green cross in the map) is located in the end of the second corridor and the output image

(red circle and strongest match) is in the third corridor.

The performance overall of the algorithm is around80% correct correspondences,8% almost correct

(the output is not the closest point with recorded SIFTs but it is close, i.e.same corridor) and12%

incorrect correspondences.

0 5 10 15 200

2

4

6

8

10

12

14

16

18

20

step 5

step 100

step 81

step 32

a) Global map with the locations

Randomly chosen image Similar image present in the set

b) Query image (step 5) and retrieved imageRandomly chosen image Similar image present in the set

c) Query image (step 81) and retrieved image

Randomly chosen image Similar image present in the set Randomly chosen image Similar image present in the set

d) Query image (step 100) and retrieved image e) Query image (step 32) andretrieved image

Figure 5.3: Map with robot ground truth locations (green crosses) and estimated locations (red circles)based on query and database images (a). The query images for the various steps (5, 81, 100 and 31) areon the left and the retrieved matching images on the right (b, c, d, e).

The results obtained show that this method works fine overall but there aresome problematic areas

in the map that it constantly mismatched. For a better success rate in localizing the robot cannot rely in

a single match. The use of particle filters to localize the robot based on the sensor readings [10], which

in this case are the SIFT descriptors from the RGB images, allows for the robot to recover its correct

42 Experiments

location in a limited number of sequential images. In this method, given a map of the environment, the

algorithm estimates the position and orientation of a robot as it moves and senses the environment. The

algorithm uses a particle filter to represent the distribution of likely states, with each particle representing

a possible state, i.e. a hypothesis of where the robot is. Whenever the robot moves, it shifts the particles

to predict its new state after the movement. When the robot senses something, the particles are resampled

based on recursive Bayesian estimation, i.e. how well the actual sensed data correlate with the predicted

state. The particles should converge towards the actual pose of the robot.

5.3 Motion Estimation

Motion estimation, as mentioned before, is the process of determining the movement of the camera from

one image to the next. The path given to the camera is a pre-programmed one asseen in Figure 5.4.

The next figures show the intermediary steps in motion estimation used in the mapping and navigation

process. In Figure 5.5a we plot the corresponding matched SIFT features in two consecutive steps.

Some of these corresponding points are not correct so we apply the RANSAC algorithm with the rigid

transformation to search for inliers. Those can be seen in Figure 5.5b. Using the inliers from consecutive

steps we can compute two point clouds and use the Procrustes analysis to find the transformation between

them. Figure 5.5c depicts both point clouds, one in blue and another in red withthe matching done with

a green line, and show the pose of the camera before and after the transformation calculated. If we

superimpose all the estimated poses from all the steps into one plot we get Figure 5.6 that illustrates the

total estimated trajectory of the camera. The resulting occupancy grid map obtained from the camera

matrices and the depth information (Figure 5.7a) in each location is in Figure 5.7c:

0 5 10 15 200

2

4

6

8

10

12

14

16

18

20

Figure 5.4: Pre-determined path used by the robot to go through the environment and map it.

5.3 Motion Estimation 43

(a) Matched SIFT features in red.

(b) Inliers from RANSAC in yellow.

−2 −1 0 1 2−101

0

2

4

6

8

10

12

14

16

18

xy

z

(c) Two point clouds from consecutive steps (inliers) and the positionof the camera before and after the estimated motion.

Figure 5.5: Color-depth camera used for motion estimation.

44 Experiments

05

1015

200

5

10

15−1

01

Figure 5.6: All the estimated positions for the camera, resulting from the motion estimation.

5.4 Occupancy Grid

We assume that the camera at its first localization defines the origin of our world. In this case, our world

consists of four corridors that form a closed loop. We move the camera throughout the environment

and at each step we grab an RGB image and a depth image. Starting at step two we compute the SIFT

features from the current and previous RGB images and using these we can compute an estimate for the

transformation between the current and previous step. At the same time we are using the kinect to detect

whatever is in front of the robot. Combining this information we can constructa map incrementally. In

each step we compute how much the camera has moved in respect to the last stepand based on that we

can place in the map all the obstacles the sensor detects. The map we are building is an occupancy grid

map that stores in each cell the probability that that cell is occupied. It is continually updated with new

information. The values for the probabilities that a sensor can detect or fails to detect an obstacle are

previously set. After the run is made and the occupancy grid is finished we have a matrix with values

between zero (free cell) and one (occupied cell).

The pre-determined path used to move around our sensor in the environment consists on wandering

through four corridors (simulating the corridors in the 5th floor of the NorthTower at IST) that form a

square and so, a closed loop. Periodically, after a certain number of steps that the robot takes, it stores

the SIFT features detected at that step in its database. If later on during theprocess of mapping, the

estimated current position of the robot is getting near to the initial starting positionwe do a search on the

database of the SIFT features from that area to see what is the best match. Using that we can compute

the error in the final position that we will use in the loop closure correction. Figure 5.7e illustrates the

occupancy grid map achieved before and after the correction for the loop closure.

5.4 Occupancy Grid 45

100 200 300 400 500

50

100

150

200

250

300

350

(a) Depth Image with scan line.

0 5 10 15 20−2

0

2

4

6

8

10

12

14

16

18

(b) All the scan lines superimposed inthe world. (c) Occupancy Grid.

(d) Occupancy Grid Map without Loop Clo-sure.The area where the loop closes is highlightedwith a red rectangle.

(e) Occupancy Grid Map with Loop Closure.Thearea where the loop closes is highlighted with a redrectangle.

(f) Occupancy Grid Map (g) Initial Bitmap (h) Dilated Bitmap

Figure 5.7: All the steps to create a map used for navigation.

46 Experiments

5.5 Patrol #1 - Run with no obstacles

In this experiment we simulate the navigation of the robot. Using the map obtained by the mapping

procedure, we plan our path and send that information to the robot. The robot POV is simulated with

the VRML model as in Figure 5.8a. Amissionis the procedure of following a certain path to reach

thatmissionsgoal point. These are listed in themission cyclewhere the key-points orfavorite locations

in the map are listed, i.e. where we want our robot to pass. At each step we simulate our sensor as a

rangefinder. Our Kinect is used as a LIDAR to detect walls or other obstacles that may traverse our path.

A list of 3D points that comprise the facets where our obstacles are simulated isused. When something

is detected like in Figure 5.8b, it is computed its position in the global map using the current estimate

of the robot position. Anything that crosses our path is considered an obstacle. Once our robot detects

something in its path, it stops and recomputes the path considering the new map withthe obstacle in it.

The obstacles are considered to be temporary, meaning that they only existduring the current mission.

(a) Free path. (b) Obstacle blocking the path.

0 100 200 300 400 500

0

50

100

150

200

250

300

350

(c) Depth image for a free path.

0 100 200 300 400 500

0

50

100

150

200

250

300

350

(d) Depth image for an obstacle.

Figure 5.8: POV of the robot when there is an obstacle ( 5.8b) and when there is none ( 5.8a).

In this experiment we start by doing some processing to the occupancy gridmatrix. This matrix is

saturated with a chosen threshold (25%) and we end up with a bitmap with the same size only with zeros

5.5 Patrol #1 - Run with no obstacles 47

and ones. After that, the map is dilated to take into account that the robot is notjust a point in the map. Its

body has a certain size so the obstacles (walls) are enlarged by3 pixels(30cm) to take in consideration

the robots radius. Figures 5.7f,5.7g,5.7h show the various stages the map goes through.

We use this map (Figure 5.7h) as one of the inputs to the sentinel mode function.The other necessary

inputs are the initial pose of the robot, the list of points of interest where we want our robot to go through

and the list of facets used to simulate the kinect in measuring depth information. The sentinel function

starts with the first point of interest in the list and plans the path necessary toreach that point. Once it has

a possible path it starts a mission to move the robot there. The mission consists onmoving the robot in

each step to the next point on the path. What it actually does is move the viewpoint in the VRML world

to that new location, simulating this way what the robot is seeing. The commands given to the VRML to

update the viewpoint position are incremental translations in a specific direction. If the next point in the

path planned is the pixel to the North direction of the map, the command ’Go one step North’ is sent to

the VRML that translate it to ’Move one step to my North’. Every time the robot moves, a snapshot of

the RGBD is taken and the depth information is used to check for obstacles in our path and to estimate

the motion made. This information is then returned to the sentinel function to checkif it moved the right

amount.

If any obstacle is in our planned path, the robot stops and re-plans, with the map updated, a new

path from the position where it detected the obstacle to the same location it was originally going. Once

it reaches the objective that is aiming, the mission is considered a success and the sentinel starts a new

mission to the next point of interest until there are no more. For this first experiment no obstacles were

put in place. The list of points that the robot had to visit were: it starts in(x = 20; y = 10) and has to

visit in the following order(100; 10) → (190; 100) → (100; 190) and then return to the starting point

(20; 10).

The method chosen for the path planning algorithm to show here was Probabilistic Roadmaps. In

Figure 5.9 we can see the various missions the robot had to complete to reach all the locations in its

mission cycle.

The total path traversed by the robot can be seen in Figure 5.9e.

48 Experiments

x

y

0 50 100 150 200

20

40

60

80

100

120

140

160

180

200

(a) Mission 1.

x

y

0 50 100 150 200

20

40

60

80

100

120

140

160

180

200

(b) Mission 2.

x

y

0 50 100 150 200

20

40

60

80

100

120

140

160

180

200

(c) Mission 3.

x

y

0 50 100 150 200

20

40

60

80

100

120

140

160

180

200

(d) Mission 4.

(e) Total path of the robot after completing all the mis-sions.

Figure 5.9: All the missions the sentinel function created. One for each goal point.

5.6 Patrol #2 - Unknown blocked passage 49

In this experiment, we successfully completed our objective of visiting a list ofplaces in the map and

also returning to the initial location.

5.6 Patrol #2 - Unknown blocked passage

In this second navigation experiment we repeat the same list of points of interest to tour except this time

we put an obstacle somewhere in the map. The only different input here is the facets list that is used

to retrieve the depth information and thus used to simulate an obstacle. The sentinel function works

exactly the same like before except that when a mission detects an obstacle in itspath it aborts. In these

cases the sentinel function receives an aborted mission and has to re-plan a new path considering the

obstacle detected. After computing a new trajectory a new mission is started. Bydesign the obstacles

are considered temporary and only remain at that location for the duration of the mission (when a new

mission starts all the obstacles are cleared). This is because it is considered that all the obstacles that

are only detected during the sentinel mode and not during the mapping mode arenot structural-fixed

obstacles and are only temporary ones (e.g. chairs). The obstacle usedin this experiment (see Fig. 5.8b)

was placed in a manner that blocked completely the first corridor(x = 50 and0 ≤ y ≤ 20), effectively

closing it down. The method chosen for the path planning algorithm to show here was Probabilistic

Roadmaps. The trajectories that each mission planned are presented in Figure 5.10.

The total path traversed by the robot can be seen in Figure 5.10f.

50 Experiments

x

y

0 50 100 150 200

20

40

60

80

100

120

140

160

180

200

(a) Mission 1.

x

y

0 50 100 150 200

20

40

60

80

100

120

140

160

180

200

Obstacle

(b) Mission 2 with the obstacle present in themap.

x

y

0 50 100 150 200

20

40

60

80

100

120

140

160

180

200

(c) Mission 3.

x

y

0 50 100 150 200

20

40

60

80

100

120

140

160

180

200

(d) Mission 4.

x

y

0 50 100 150 200

20

40

60

80

100

120

140

160

180

200

(e) Mission 5.

(f) Total path the robot had to make to visit all the points.

Figure 5.10: All the missions the sentinel function created. There is one extra mission because of theobstacle present in the environment.

5.7 Mapping and Navigation with Monocular Vision 51

In the first mission (Figure 5.10a) the planned path was pretty much a straightline from the initial

point to the first objective. However the obstacle present in the map was between them and was detected

early on the mission. So that first mission was aborted and a new mission started(Figure 5.10b) with a

new path computed with the map updated. The obstacle is only present in the second mission because, as

mentioned above, the obstacles are temporary and only taken into account during that mission. Because

of the obstacle we see in the total path (Figure 5.10f) that the robot as to go around the obstacle (in this

case find a new path to reach its objective by going the other way since it is a closed loop map) and

has to take a much bigger route to reach its goals. Despite of the obstacle it wasstill able to visit the

pre-determined locations on the map and complete its patrol starting and finishingin the same place.

5.7 Mapping and Navigation with Monocular Vision

An experiment was conducted to test the tubular-shape based VSLAM approach with textures in a VRML

corridor. For this dataset the camera went from(1, 1, 2) to (1, 1, 12) so it traveled 10 meters in the z

direction. All the walls (including the floor and ceiling) of the corridor were filled with paintings (Figure

5.2) to give a better texture for the visual features detector to find and track. In this case, the initial

assumptions for the camera position were correct. There were 80 steps witha step size of 0.125 m. The

resulting plot for the camera poses in world coordinates is in Figure 5.11a. Figures 5.11b and 5.11c show

how the state of the filter evolved over time, for position and orientation.

52 Experiments

−2−1

01

2

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

0

2

4

6

8

10

x

z

y

(a) Camera poses returned by the filter.

10 20 30 40 50 60 70 80−5

0

5

10

# steps

Dis

tanc

e

xyz

(b) State of the filter for the position of the camera.

0 10 20 30 40 50 60 70 80−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

# steps

rads

alphabetagama

(c) State of the filter for the orientation of the camera (euler angles).

Figure 5.11: Results for one corridor.

We can conclude by watching the plot that the camera moved essentially in the z direction as it was

expected. The drift in the orientation of the camera can be associated with thenoise present in the quality

of the features detected.

To test the VSLAM method for creating a map a test was conducted using the approach explained in

chapter 4. In this test, the camera moves in one corridor and based on the information retrieved by the

filter a sketch of the environment is made. This sketch is done by drawing two straight parallel lines with

the direction of the corridor updated by the filter for each step. It results ina series of line segments that

5.7 Mapping and Navigation with Monocular Vision 53

are positioned in the map according to the filter’s position of the camera in the world and in relation to

the walls (Figure 5.12a). The result is straightforward and is visible the corridor in the end map in Figure

5.12b. In the next test the camera moves in two corridors that are connected by a90 degree corner. The

results for this test are not as good. The resulting map is not a good representation of the environment

because of the rotation the camera suffers when exiting one corridor andentering another. The structural

assumption of the environment used in the observation model of the Kalman filterdoes not take this

into account and fails to keep track of the back-projection of features during the rotation. So it fails to

understand how the camera is moving (Fig. 5.12c) and it cannot update the direction of the corridor

correctly. This is visible in the final map seen in Figure 5.12d.

Since the objective of this dissertation is to study and implement an application thatis ultimately able

to map the environment and then navigate in it we will use the map obtained for onecorridor and we

will navigate in it by visiting a sequence of points of interest much like taking a tour. The list of points

followed were:(10, 50) → (10, 90) → (10, 140) and then returning to the starting point(10, 20). The

trajectory planning method chosen was Probabilistic Roadmaps. Figure 5.13 shows the paths taken to

reach each point and the total path can be seen in Figure 5.13e.

−2

−1

0

1

20 2 4 6 8 10 12 14

x

y

(a) Sketch representing the direction of the tube in eachstep for one corridor.

(b) Occupancy Grid map for one corridor.

−2

−1

0

1

2

30 2 4 6 8 10 12 14 16 18

x

y

(c) Sketch representing the direction of the tube in eachstep for two corridors.

(d) Occupancy Grid map for two corridors.

Figure 5.12: Map representation for one and two corridors.

54 Experiments

x

y

0 20 40 60 80 100 120 1400

10

20

30

40

(a) Mission 1.

x

y

0 20 40 60 80 100 120 1400

10

20

30

40

(b) Mission 2.

x

y

0 20 40 60 80 100 120 1400

10

20

30

40

(c) Mission 3.

x

y

0 20 40 60 80 100 120 1400

10

20

30

40

(d) Mission 4.

(e) Total path of the robot after completing all the mis-sions.

Figure 5.13: Result for the VSLAM experiments.

We conclude that the robot visited the list of all locations defined as it was suposed to.

Chapter 6

Conclusion and Future Work

The work described in this thesis aimed at developing and simulating a robotic sentinel that first, maps

an indoors environment and secondly, is able to navigate through the environment, computing alternative

trajectories when obstacles block the previously computed trajectories. Themain sensor available to the

robot is one video camera.

The first video camera considered was a Color-Depth (RGBD) camera. With the depth information

available we were able to infer the 3D position of the features. By using consecutive image frames

we estimated the motion of the camera as a rigid transformation. The process is thefollowing: First

SIFT features are computed between consecutive images and matched; Then the matched 2D features

are converted to 3D, in the camera coordinate frame; The next step consists simply in computing a rigid

transformation between both clouds of points; The computed transformation isthe (inverse) motion the

camera made to capture the two images.

In addition to estimate the motion of the camera, the color-depth camera allows computing an hori-

zontal depth profile, which can be integrated along time to form an occupancy grid. Given the occupancy

grid, allows planing trajectories.

The second video camera considered was simply a Color (RGB) camera. Given the lack of depth

information, we have introduced an Extended Kalman Filter based in the cameramodel and a constant

velocity motion model, forming therefore a visual-SLAM framework. In particular we have considered

that the scenario can be described by a tubular shape, which gives further consistency to the measure-

ments. Considering the tubular shape hypothesis has the additional benefitof saving and keeping constant

the computer memory, as the parameters describing the tube are much less than the number of registered

features.

Both types of cameras allowed in the end to compute an occupancy grid and planing trajectories.

Navigation results were promising in both cases as the robot visits all the predefined locations and, if an

obstacle blocks its path, it tries to go around and reach the objective through an alternative path.

The use of the color (only) camera is constrained in our work by the tubularshape model. In par-

ticular a single straight section is considered at each moment and consequently the EKF over-smooths

56 Conclusion and Future Work

the camera motion in corridor corners. The navigation methodology based in the color-depth camera

allows a more general world structure, but is limited to the capabilities of the camera. In some cases, the

smoothing properties of an EKF are desired. In the end, a combination of both methodologies may be

the best result.

As future work, the visual SLAM method can be improved with a world model taking into account

multiple corridors. The explicit modeling of corners, or a function that detects the exiting of one corridor

and entering the next, decreases the over-smoothing of the robot trajectories. Smoothed trajectories allow

a topological-like navigation, where a robot can return home and reset localization error. The explicit

modeling of corners allow helping the detection of closed loops, and therefore take more advantage of

graph based trajectory planing.

Bibliography

[1] Ted J. Broia and Rama Chellappa. Estimating the kinematics and structure ofa rigid object from a

sequence of monocular images.IEEE T-PAMI, 13(6):497–513, June 1991.

[2] Ted J. Broia, Rama Chellappa, and S. Chandrashekhar. Recursive 3-d motion estimation from a

monocular image sequence.IEEE Transactions on Aerospace and Electronic Systems, 26(4):639–

656, July 1990.

[3] Stefano Carpin. Randomized motion planning: a tutorial.I. J. Robotics and Automation, 21(3),

2006.

[4] Javier Civera, Andrew J. Davison, and J.M.M Montiel. Inverse depth parametrization for monocular

slam. IEEE Transactions on Robotics, 24(5):932–945, October 2008.

[5] W3 Consortium. Virtual reality modeling language.http://www.w3.org/MarkUp/VRML/.

[6] Peter I. Corke.Robotics, Vision & Control: Fundamental Algorithms in Matlab. Springer, 2011.

[7] Andrew J. Davison, Ian D. Reid, Nicholas D. Molton, and Olivier Stasse. Monoslam: Real-time

single camera slam.IEEE T-PAMI, 29:2007, 2007.

[8] Hugh Durrant-Whyte and Tim Bailey. Simultaneous localisation and mapping(slam): Part i the

essential algorithms.IEEE ROBOTICS AND AUTOMATION MAGAZINE, 2:2006, 2006.

[9] Albert Elfes. Using occupancy grids for mobile robot perception and navigation. Computer,

page 57, 1989.

[10] Dieter Fox, Sebastian Thrun, Wolfram Burgard, and Frank Dellaert. Particle filters for mobile robot

localization, 2001.

[11] Michael A. Goodrich. 1 introduction potential fields tutorial.

[12] Richard Hartley and Andrew Zisserman.Multiple View Geometry in computer vision. Cambridge

University Press, 2003.

http://www.w3.org/MarkUp/VRML/

58 BIBLIOGRAPHY

[13] R. E. Kalman. A new approach to linear filtering and prediction problems. Transactions of the

ASME Journal of Basic Engineering 82 Series D: pages 35-45, 1960.

[14] J.J. Kufner and S.M. LaValle. Rrt-connect: An efficient approach to single-query path planning. In

IEEE Int. Conf. on Robotics and Automation, pages 995–1001, 2000.

[15] S. LaValle. http://mappingignorance.org/2013/02/14/rapidly-exploring-manifolds-when-going-

from-a-to-b-aint-easy/.

[16] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of

Computer Vision, pages 91–110, 2004.

[17] D. Pereira, J. Tomaz, J. Gaspar, and R. Ferreira. Mosaicing theinterior of tubular structures. In

RecPad, 2013.

[18] Luís Ruivo. Omni-sonda: Câmara de visão omnidireccional para inspecção visual. Master’s thesis,

DEEC/IST, 2007.

[19] Shaojie Shen, Nathan Michael, and Vijay Kumar. 3d indoor exploration with a computationally

constrained mav, 2011.

[20] Simeon87. http://en.wikipedia.org/wiki/Motion_planning.

[21] R. Smith, M. Self, and P. Cheeseman. Estimating uncertain spatial relationships in robotics. In I. J.

Cox and G. T. Wilfong, editors,Autonomous Robot Vehicles, volume 8, pages 167–193. Springer-

Verlag New York, Inc., 1990.

[22] S. Thrun. Particle filters in robotics. InProceedings of the 17th Annual Conference on Uncertainty

in AI (UAI), 2002.

[23] João Pedro Borralho Tomaz. Visual self-localization and mapping using the inside texture of a

tubular structure. Master’s thesis, Instituto Superior Técnico, 2013.

[24] Taylor Wang. http://taylorwang.wordpress.com/2012/04/06/collision-free-path-planning-using-

potential-field-method-for-highly-redundant-manipulators/.

[25] Wikipedia. Microsoft kinect. http://en.wikipedia.org/wiki/Kinect.

[26] Wikipedia. Rodrigues’ rotation formula. http://en.wikipedia.org/wiki/Rodrigues_rotation_formula.

[27] Wowwee. Wowwee rovio. http://www.wowwee.com/en/products/tech/telepresence/rovio/rovio.

[28] Kai M. Wurm, Armin Hornung, Maren Bennewitz, Cyrill Stachniss, and Wolfram Burgard. Oc-

tomap: A probabilistic, flexible, and compact 3d map representation for robotic systems. InIn

Proc. of the ICRA 2010 workshop, 2010.

BIBLIOGRAPHY 59

[29] Gem-Sun Young and Rama Chellappa. 3-d motion estimation using a sequence of noisy stereo

images.IEEE T-PAMI, pages 710–716, June 1988.

robotic home sentinel - ulisboarobotic home sentinel david cruz veiga Álvares pereira thesis to...

Documents