autonomous driving of competition robot · o seu hardware e software. dada a sua complexidade, a...

Autonomous Driving of Competition Robot

Localization and Environment Perception

Ruben Daniel Nunes Capitao

Thesis to obtain the Master of Science Degree in

Electronics Engineering

Supervisor: Pedro Manuel Urbano de Almeida Lima

Examination Committee

Chairperson: Pedro Miguel Pinto RamosSupervisor: Pedro Manuel Urbano de Almeida LimaMember of the Committee: Carlos Baptista Cardeira

October 2018

Acknowledgments

Firstly, a huge thanks to Francisco for sharing this thesis, for all the time, effort, and dedication to this

project, without him this thesis would not be possible to achieve. To my supervisor, Pedro Lima, for his

guidance and advice whenever I needed the most. I’d also like to share my words of appreciation to all

the people involved in N3E and N3E Robotics, for always sharing their ideas, knowledge and a much

appreciated help when in comes to competitions’s bureaucracies. Also, to my parents and girlfriend for

consistently providing all the necessary means and support for me to accomplish all I ever wanted.

To each and every one of you I’m expressing these words of gratitude!

Abstract

With the objective of competing in the Autonomous Driving challenge of the Portuguese Robotics Open

2018, an autonomous mobile robot based on a remote-controlled car was entirely designed, imple-

mented and tested, including its hardware and software. Given its broad scope, the full extent of the

competition challenges is covered across two parallel theses. The present thesis focuses, on the hard-

ware part, on the electronics of the robot and the localization and perception of the environment software

modules. It proposes a different approach from traditional lane-following techniques. In the thesis ap-

proach, the robot is fully aware of its surroundings and about its absolute localization. Although the track

is planar, i.e., without any surrounding walls, a novel algorithm is developed to generate virtual walls over

the lines of the track enabling occupancy-grid based localization using Monte Carlo Localization. The

same algorithm is capable of identifying the track and segment it from the present obstacles standing

on it, allowing their detection. To conclude, the recognition of semaphores and traffic signs based on

state-of-the-art approaches, namely, image processing and machine learning, are also developed.

Keywords

Autonomous Driving; Localization; Perception of the Environment; Robotics Operating System.

iii

Resumo

Com o objetivo de competir no desafio de Conducao Autonoma do Festival Nacional de Robotica 2018,

foi criado, implementado e testado um carro autonomo baseado num carro tele-comandado, incluindo

o seu hardware e software. Dada a sua complexidade, a totalidade dos desafios da competicao e apre-

sentada em duas teses paralelas. No que respeita ao hardware, a presente tese foca-se na eletronica

do robo e nos modulos de software de localizacao e percecao do ambiente. Esta tese propoe uma

abordagem diferente as tecnicas tradicionais de seguir linhas. O robo esta totalmente informado sobre

o que o rodeia e em relacao a sua localizacao absoluta. Embora a pista seja plana, i.e., sem paredes

a volta, um novo algoritmo e desenvolvido para gerar paredes virtuais em torno das linhas da pista,

permitindo a utilizacao de matrizes de ocupacao recorrendo a Localizacao Monte Carlo. O mesmo al-

goritmo e capaz de identificar a pista e segmenta-la dos obstaculos existentes na mesma, permitindo

a sua detecao. Para concluir, sao tambem desenvolvidos os modulos necessarios para o reconheci-

mento de semaforos e sinais de transito baseado em abordagens do estado da arte, nomeadamente,

processamento de imagem e machine learning.

Palavras Chave

Conducao Autonoma, Localizacao, Percecao do Ambiente, Robotics Operating System.

v

Contents

1 Introduction 1

1.1 Purpose and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Goals and challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Document organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Related Work 7

2.1 Robot pose estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1.A Bayesian framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1.B Gaussian filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1.C Particle filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.2 Sensors and techniques used for pose estimation . . . . . . . . . . . . . . . . . . 11

2.1.2.A Wheel odometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.2.B Visual odometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1.2.C Inertial Measurement Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1.2.D Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Perception of the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1.A Types of representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1.B Static vs Dynamic environments . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.2 Obstacle detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.3 Traffic sign detection and recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.3.A Color-based detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.3.B Shape-based detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.3.C Hybrid Color-Shape-based detection . . . . . . . . . . . . . . . . . . . . 18

2.2.3.D Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

vii

3 System Architecture and Components 20

3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.1 Robot chassis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.2 Actuators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.3 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.4 Hardware interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1.5 Computing platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.2 Software model of the robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.3 Software modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.3.A Sensor calibration parameters . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.3.B Ground, lines, and obstacles detection . . . . . . . . . . . . . . . . . . . 27

3.2.3.C Relative Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.3.D Global Localization and Mapping . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.3.E Semaphore and Traffic Sign Recognition . . . . . . . . . . . . . . . . . . 28

4 Implementation 29

4.1 Hardware Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1.1 Power regulation and interface with small sensors and actuators . . . . . . . . . . 30

4.1.2 Bridging between the low-level hardware and the computing platform . . . . . . . . 31

4.2 Sensor calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.1 Low-level sensors calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.2 Kinect sensors calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Ground, lines and obstacle detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1 Ground plane segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1.A Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1.B User-Defined Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.1.C Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.2 Road lines extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.2.A Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.2.B User-Defined Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.2.C Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.3 Obstacle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.3.3.A User-Defined Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3.3.B Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

viii

4.4 Relative Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4.1 Visual Odometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4.1.A User-Defined inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.4.1.B Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.4.2 Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43



4.5 Global Localization and Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.5.1 Global Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45



4.5.2 Absolute Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46



4.6 Semaphores and Traffic Signs Detection and Recognition . . . . . . . . . . . . . . . . . . 48

4.6.1 Semaphore Detection and Recognition . . . . . . . . . . . . . . . . . . . . . . . . 48

4.6.1.A Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.6.1.B User-defined inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.6.1.C Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.6.2 Traffic Sign Detection and Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.6.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5 Evaluation 52

5.1 Performance of the modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1.1 Ground and obstacles segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1.1.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1.1.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1.2 Road lines to virtual walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1.2.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1.2.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1.3 Obstacle detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1.3.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1.3.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1.4 Semaphore Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.4.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.4.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

ix

5.1.5 Traffic Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.1.5.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.1.5.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.1.6 Global Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1.6.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1.6.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1.7 Global Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.1.7.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.1.7.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Performance of the system on the competition environment . . . . . . . . . . . . . . . . . 62

5.2.1 Robotics Open Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6 Conclusions and Future Work 64

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

A Printed Circuit Board 76

B Competition results 79

x

List of Figures

1.1 Track overview with zebra crossing, obstacle, and tunnel. Obtained from [1]. . . . . . . . . 3

1.2 Different instructions provided by the semaphores used in the competition. . . . . . . . . 3

1.3 Different traffic signs available on the competition. . . . . . . . . . . . . . . . . . . . . . . 4

3.1 Proposal of functional architecture, including both Hardware and Software modules . . . . 21

3.2 Robot chassis overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 tf2 transforms of the designed robot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 Visual description and representation of the ground and obstacles segmentation algorithm. 36

4.2 Visual description and representation of the generation of the virtual laser scan from an

input colored point cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3 Script interface used for the calibration of the CIELAB’s channels range values. . . . . . . 41

4.4 Visual demonstration of the output of the obstacle detection node, with red color, from the

input obstacles point cloud, with greed color. . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.5 Binary templates used by the template matching method. . . . . . . . . . . . . . . . . . . 50

4.6 Visual demonstration of the YOLO implementation. . . . . . . . . . . . . . . . . . . . . . . 51

5.1 Comparison between the testing and the (real) competition environment. . . . . . . . . . 53

5.2 Visual demonstration of the number of points belonging to the ground when an obstacle

is at 10 cm from the car’s front . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.3 Visual demonstration of effect of distance in the number of points belonging to a track’s line. 56

5.4 Global map of the track captured by GMapping on the test environment. . . . . . . . . . . 60

5.5 Visual demonstration of path provided by the global localization, represented with the

green color, and relative pose estimation, represented with the red color. . . . . . . . . . . 62

A.1 Circuit board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

A.2 Circuit schematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

xi

List of Tables

1.1 System specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5.1 Ground and obstacles segmentation module performance on the different testing scenar-

ios on 10000 different point clouds for Scenario A and 1030 for Scenario B . . . . . . . . . 55

5.2 Results of the virtual walls node based on distance for 5000 iterations. . . . . . . . . . . . 57

5.3 Results of the obstacle detection node based on distance for 5000 iterations. . . . . . . . 57

5.4 Results of the semaphore detection node for the difference scenarios. . . . . . . . . . . . 59

5.5 Results of the obstacle detection node based on distance for 5000 iterations. . . . . . . . 61

xiii

Acronyms

2D Two-Dimensional

3D Three-Dimensional

ADC Analog to Digital Converter

AMCL Adaptative Monte Carlo Localization

CPU Central Processing Unit

DC Direct Current

EKF Extended Kalman Filter

GPS Global Positioning System

GPU Graphics Processing Unit

I/O Input/Output

I2C Inter-integrated Circuit

IMU Inertial Measurement Unit

IR Infrared

ISO International Organization for Standardization

KF Kalman Filter

LED Light Emitting Diode

LIDAR Light Detection and Ranging

MCL Monte Carlo Localization

OpenCV Open Source Computer Vision Library

xv

PCB Printed Circuit Board

PCL Point Cloud Library

PDF Probability Density Function

PF Particle Filter

PWM Pulse Width Modulation

RAM Random Access Memory

RANSAC RANdom SAmple Consensus

RC Radio Controlled

RGB-D Red, Green, Blue, and Depth

RGB Red, Green, and Blue

ROS Robotics Operating Systems

RViz ROS Visualization

SAE Society of Automotive Engineers

UART Universal Asynchronous Receiver-Transmitter

UKF Unscented Kalman Filter

USB Universal Serial Bus

VO Visual Odometry

YOLO You Only Look Once

xvi

1Introduction

Contents

1.1 Purpose and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Goals and challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Document organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1

Introduction

1.1 Purpose and motivation

According to the European Commission, in 2016, 25.500 people died on European roads, more than

70 per day [2]. In addition, and as stated on the European Union ambitious Road Safety Programme [3]

for every death there are an estimated four permanently disabling injuries (brain or spinal cord damage),

eight serious injuries and 50 minor injuries. In a different study, the National Highway Traffic Safety

Administration, in 2015, stated that 94% of car crashes, solely in the United States were due to human

error [4].

Autonomous vehicles have proven to substantially increase road safety by decreasing not only the

number of car accidents and the number of deaths, but also improve traffic flow and efficiency. On the

one hand, it allows the reduction of congestion, pollution, the demand for well-defined roads, and the

excessive and expensive usage of road illumination. On the other, it increases the passenger comfort,

and it can bring independence back to the elderly and disabled people. Plus, the travel time can be used

to work or relax.

With this premise in mind, many car manufacturers have been releasing the first commercially avail-

able Driving-systems of Conditional Automation type, seen as the starting level system capable of sens-

ing the surrounding environment, as described in Society of Automotive Engineers (SAE) International

Standard J3016 [5].

Although it’s being witnessed a continuous technological advancements on a daily basis, there’s still

a lot to be done to provide commercially available fully-autonomous systems. In recent years, a lot of

work and research has been done not only by car manufacturers but also by the academia and other

institutions.

At a national level, for the past years, Sociedade Portuguesa de Robotica has organized the Au-

tonomous Driving competition from Portuguese Robotics Open [1]. Designed for students, this compe-

tition aims to promote the development of autonomous vehicles in a scenario that resembles an urban

road.

The purpose of this work is to develop on the state of the art literature of self-driving cars and create

2

an affordable solution that can be extrapolated to real-life scenarios, in an attempt to reduce the number

of car accidents and the associated injuries and deaths.

1.2 Problem statement

The objective of the Autonomous Driving competition from the Portuguese Robotics Open is to have

a fully autonomous robot performing a set of road challenges that are similar to real-life scenarios.

The competition’s environment is partially depicted in Fig. 1.1 and is comprised of a closed eight-

shaped track that fits on a 16.70 m x 6.95 m rectangle and with two intersections, a zebra crossing,

traffic lights, vertical traffic signs, and parking area.

Figure 1.1: Track overview with zebra crossing, obstacle, and tunnel. Obtained from [1].

From the first to the last rounds the assigned tasks increase in difficulty. It starts as a pure speed run,

with just a start signal and, from one round to another, different challenges are inserted. The challenges

come in various forms and vary from respecting the instructions provided by semaphores, recognizing

vertical traffic signs, perform parallel and perpendicular parking, safely avoiding obstacles, and moving

under diminished light (tunnel) and in under-construction areas.

The semaphores and their instructions are presented in Fig. 4.5.

(a) Left (b) Forward (c) Right (d) Finish (e) Stop (f) Parking

Figure 1.2: Different instructions provided by the semaphores used in the competition.

3

The different traffic signs that have to be recognized are depicted in Fig 1.3.

(a) Bus Stop (b) MandatoryMediumLights

(c) Roundabout (d) MandatoryLeft

(e) Hospital (f) Park (g) PedestrianCrossing

(h) RecommendedSpeed

(i) Animals (j) Depression (k) Lane Sup-pression

(l) Caution

Figure 1.3: Different traffic signs available on the competition.

Given the broad scope of the areas concerning the competition challenges, the full extent of the

problem is split into two parallel theses, developed at the same time. The present thesis focuses, on

the hardware part, on the electronics of the robot and on the necessary software modules to model the

environment, infer the absolute localization of the robot, and detect and recognize the objects placed in

the track. This information is to be utilized by the navigation and decision-making modules, developed

on Francisco’ thesis [6].

1.3 Goals and challenges

The environment in which the competition is inserted has a mixture of both indoor and outdoor

characteristics. As it takes place inside a warehouse, Global Positioning System (GPS) is not available,

resembling an indoor environment. However, the track is placed on a flat open space area without any

4

near walls, resembling an outdoor environment.

The objects inserted on the track for the different challenges, e.g., obstacles and traffic signs, do

not have a fixed position and, consequently, cannot be used for global localization purposes. The only

available information for absolute localization is the road’s landmarks. Due to the lack of fixed Three-

Dimensional (3D) objects, Light Detection and Ranging (LIDAR) sensors are practically unusable for

localization purposes since those only perceive distance, not colors. In these conditions, the use of

vision is almost mandatory.

Although the dimensions of the track are known, as it is constructed manually and since the car-

pet can fold, the exact location and dimensions of the road’s landmarks may be subject to errors. As

described further in this document, an accurate map is needed for an accurate position estimation. Man-

ually modeling the map is very time consuming and error-prone. Since the amount of time before the

competition is very limited, the map needs to be constructed automatically by navigating the track before

or during the competition.

As described in the competition’s rules [1], each round is scored according to an equation that takes

two aspects into consideration: time of execution and penalties. Penalties are provided for infractions

committed during each challenge. This forces all competitors to create driving-systems that are both

reliable and fast, which also imposes a real-time constraint. Taking those aspects into consideration and

adding the motivation to address a real-world problem with a limited amount of resources, it was decided

to prototype a non-holonomic car-like robot.

Considering the competition rules, the experience from previous performances and the restrictions

in the available resources, a set of specifications for the robot are defined. These are outlined in Table

1.1.

Table 1.1: System specifications.

TargetSpeed

Positionaccuracy

Orientationaccuracy

Obstacledetectionaccuracy

Obstacleidentificationrate

Poseestimationrate

2 m/s ±0.025 m ±5 º ± 0.05 m 5 Hz 10 Hz

The factor velocity is key for the competition. The highest speed from previous competitions was

approximately 2m/s. Thus, it’s set as target. This also sets the real-constraints for the system and

developed modules.

The position accuracy is determined by the competition rules. At the end of a challenge, the au-

tonomous vehicle must stop with its front-most part within a 5 cm thick line, hence the defined accuracy.

The remaining specifications are defined to ensure that the navigation and decision-making modules

have the necessary accuracy and rate to drive the car at the imposed target speed and quickly react to

the challenges ahead.

5

1.4 Original Contributions

A list with the original contributions of the present thesis are provided below:

• Dimensioning and design of the necessary hardware components that support and power the

autonomous vehicle and serve as a basis for an academic research platform. This includes re-

viewing the state-of-art sensors and actuators and it’s applicability on the competition environment

as well as the necessary hardware to bridge the low-level sensors with the Robotics Operating

Systems (ROS).

• Implementation of a novel algorithm that simulates virtual walls from the track’s lines enabling its

usage with Monte Carlo Localization (MCL) algorithms.

• Adhibition and customization of the state-of-the-art algorithms used for mapping and localization

of a robot.

• Implementation of state-of-the-art methods for semaphore and traffic sign detection, which range

from image processing to machine learning techniques.

1.5 Document organization

With the motivation, goals, and contributions of the present thesis described, it’s presented how the

document is organized in the following chapters, namely:

Chapter 2 reviews the state of the art literature that is applied in the mobile robotics fields, which can

also be applied on the context of the Robotics Open’s Autonomous Driving challenge.

Chapter 3 presents a proposal of the hardware and software components that are used and imple-

mented on this work. The system architecture and contribution of each component to the end goal

is also described.

Chapter 4 outlines the design implementations of each individual hardware component and software

node. For the software nodes, the technical aspects of each algorithm are also described in more

detail.

Chapter 5 describes evaluation methodologies carried to determine the performance of each individual

node. On a broader scope, the input of each module to the end goal is characterized along with

the performance on a real scenario: the Robotics Open’s Autonomous Driving competition.

Chapter 6 summarizes the accomplishments of the present work and provides an overview of the

improvements and future work that can be done using this thesis as a basis.

6

2Related Work

Contents

2.1 Robot pose estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Perception of the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

7

Related Work

2.1 Robot pose estimation

2.1.1 Localization

A robot’s localization is defined as the problem of estimating its pose relative to its surrounding

environment. The robot pose is composed of its location and orientation relative to a global coordinate

frame. Therefore, localization is viewed as a coordinate transformation problem.

In a real scenario, there is no way of obtaining the pose directly from a single sensor measurement,

according to Thrun et al. [7]. As stated by the authors, the culprit for such statement is a key element of

robotics: uncertainty.

Virtually all mobile robot localization literature makes use of probabilistic laws to account for un-

certainty. This comes from the fact that, for a robot to be able to perform its task without any human

intervention, it needs to deal with its uncertainty [8,9].

In probabilistic robotics, to cope with uncertainty, instead of a single true state hypothesis, the state

is represented by a Probability Density Function (PDF) over the possible values.

The PDF of state is coined in [7] as the robot’s belief. Belief can be seen has the robot’s internal

knowledge about the state of the environment from all past observations and controls while assuming

their uncertainties. Hence, belief of a state variable x at time t, xt, is defined by the following equation:

bel(xt) = ρ(xt|z1:t, u1:t), (2.1)

where z1:t represent all past sensor measurements and u1:t all past controls. The reason behind the

distinction between control and sensor data will be covered further ahead.

To calculate belief distribution, the most widely used framework is the Bayesian framework. It’s

interesting to see how even the most recent literature is still based on this framework, 50 years after it

was introduced by Sorenson and Stubberud [10] for approximation of the a posteriori density function.

8

2.1.1.A Bayesian framework

The Bayesian theorem has served has a basis to the majority of localization algorithms that have

been introduced, thus, understanding its concepts is essential. Bayes filter performs state estimation

recursively and each iteration is composed of two steps: prediction and correction [7].

Prediction or control update attempts to estimate the current state using only the controls provided to

the robot, before the sensor measurements are included. To distinguish from the posterior, bel(xt),

it is usually defined as bel(xt) by the equation:

bel(xt) =

∫ρ(xt|u1:t, x1:t−1))dx. (2.2)

Correction or measurement update takes the sensor data to tighten the width state PDF, i.e., increases

the overall certainty of the robot. This is accomplished by multiplying the a-priori belief, bel(xt), by

the measurement probability. The measurement probability describes how the measurement z is

generated from the environment state x. This steps is expressed by the function:

bel(xt) = p(zt|xt)bel(xt). (2.3)

Due to its complexity, the original algorithm can only be implemented in very simple estimation prob-

lems. To deal with realistic scenarios, numerous approximations of the Bayes filter algorithm were

introduced. The most frequently used derivations of it are Gaussian and Particle filters.

2.1.1.B Gaussian filters

Gaussian filters, as the name implies, approximate the posterior PDF by a multivariate normal distri-

bution. This means that the posterior is solely described by its mean, µ, and covariance, σ.

The most notable and studied implementations of Gaussian filters are the Kalman Filter (KF) and

its variants [7]. The KF was introduced in the decade of 1950 and is named after the author, Rudolph

Kalman [11]. It is a technique for filtering and predicting linear systems. Although it was not originally

designed for robotics applications, this method has been heavily studied in the field for its ability to

implement the Bayes filter [12, 13]. Although it is applicable to very simple linear problems, in practice

there are very few occasions where the systems are linear, rendering the linear KF unusable in most

of real-world systems. To tackle this limitation, various extensions of the original Kalman Filter were

introduced.

The Extended Kalman Filter (EKF) [14] is able to overcome the linearity assumption of regular KF by

linearizing the system using first order Taylor expansion. By performing linearization, the computational

9

complexity of the system is reduced. The EKF has shown good practical results when systems are

approximately linear [15]. For that reason, it has become one of the most popular methods for state

estimation [7]. Nevertheless, since it is based on linearization, when systems are highly non-linear

the EKF approximation presents a significant amount of error. In [16], it was proven to be its major

weakness.

With this in mind, Julier et al. [17] introduced the Unscented Kalman Filter (UKF). Its name derives

from the used sampling technique, the unscented transform. To approximate the non-linear system, a set

of sample points, denominated by the authors sigma points, are chosen around the state mean. These

points are then applied to a non-linear function, from which a new mean and covariance is deduced.

As in all KF approaches, it only solves what is known in the literature as position tracking or local

localization problem [18]. This is a consequence of representing belief using a normal distribution, as it

is unimodal and, therefore, confined to a single peak. Also, those rely on the assumption that the initial

state of the robot is known and the uncertainty is always kept within a small amount [7].

To deal with global localization and the kidnapped robot problem, multi-modal approaches were

introduced. In the next section, this new set of filters is further described.

2.1.1.C Particle filters

Particle Filter (PF) were not the first multi-modal approach to localization. The first multi-modal ap-

proach was Grid-based Markov localization [19]. However, in [20]. PF were proven to be more memory

efficient, faster, more accurate, and easier to implement. For that reason, although it is historically

important, it is not covered on this document.

PF name and success comes from adding a set of randomized samples, also called particles, during

the prediction step [20]. This way, instead of having to reason about the probability distribution of state, it

simply tries to randomly guess a possible one in a way that favors likely positions over unlikely ones. Due

to that, it was even called survival of the fittest algorithm [21]; although it is mostly known as MCL [22]

or condensation algorithm in computer vision literature [23].

The key idea of PF is to have N random particles that possess information about the robot’s pose

along with a weight that is related to the probability of that being the correct pose. The advantage of

Particle Filters over Gaussian ones it that the initial position does not need to be known. Due to its

survival of the fittest algorithm, it converges to the correct pose at a rate of 2√N . This means that, the

bigger the number of samples, the fastest it converges. However, this naturally comes with a higher

computation cost too.

Although regular MCL has advantages over other multi-modal approaches, it has its pitfalls too.

Since the particles are inserted randomly, it might happen that none of the generated particles is near

the correct pose. To this effect, it is given the name of particle deprivation problem [7].

10

Another disadvantage (and a highly counter-intuitive one) is that regular MCL becomes arbitrarily in-

accurate the more accurate sensors are. To picture this, one must image a scenario where the sensors

have no noise. In such conditions, the state probability will be 0 to all cases except to the single dirac

value that actually corresponds to the true state. When generating samples, it will be very unlikely to

generate a sample that matches exactly the robot’s true pose. Consequently, when assessing the im-

portance of the samples, all will have arbitrarily low weight unless, by accident, one actually matches the

true pose. Although this is an hypothetical scenario and there are no noise-free sensors, it exemplifies

the problem in regular MCL. It arises from the fact that sensor measurements are not taken into account

in the prediction step.

A variant called Mixture-MCL [24] claims to mitigate the aforementioned problems by using MCL com-

bined with its “dual”. The “dual” of MCL can be seen as an order inverted MCL, i.e., it first incorporates

sensors measurements to guess poses and, only then, it takes control data to validate the compliance

of the guess. In [24], it is stated that neither MCL or dual MCL alone provide a good guess, but when

together, these can achieve much better results for a smaller number of samples. As described, the

computational cost increases with the number of samples N, thus, it is preferable to keep the value as

small as possible.

Also with this idea in mind, the Adaptative Monte Carlo Localization (AMCL) [20] was introduced.

The key idea in this approach is that the number of sample varies depending on the uncertainty of

the robot. Whenever the probability associated to all samples is small, it means that the certainty that

those correspond to a correct pose is also small. For that reason, to make it converge faster, the

number of samples is increased. It can work the other around, i.e., when there are too many high

probability samples, the number of samples is reduced. This way, computational resources can be

efficiently managed without compromising the overall accuracy.

More recently, in 2009, an approach called Self-adaptive Monte Carlo Localization [25] was intro-

duced to deal with the randomness problem described. In order to reduce the computational burden of

traditional MCL approaches, the authors added the concept of Similar Energy Regions. Similar Energy

Regions are calculated when the map is built and pre-cached before the localization process starts. The

key difference is that, instead of adding samples randomly in the map, these are added in Regions that

have increased likelihood of being the regions the robot is. By adding samples in smaller regions, the

number of samples can be reduced, consequently decreasing the computation cost.

2.1.2 Sensors and techniques used for pose estimation

To decrease the overall uncertainty associated to the robot pose estimation, accurate sensor data

is required. To achieve this, many authors have used different sensors and techniques to increase

the accuracy of their systems. With this in mind, the literature’s most commonly used sensors and

11

methodologies that can be applied in the competition’s premises are reviewed.

2.1.2.A Wheel odometry

The most widely used sensor in mobile robotics is, incontestably, wheel odometry. This classical

technique is used in wheeled robots, bases on the assumption that the wheel revolutions are translated

to a linear movement in the vehicle. Since the majority of mobile robots are wheeled, it is normal that it

is so vastly used.

Odometry is calculated incrementally and, due to its cumulative nature, it is very sensitive to errors.

In the literature, two distinct types of errors are described: systematic and non-systematic [26].

Systematic errors are based on the intrinsic characteristics of the robot, according to Borenstein

and Feng [26] studies. As pointed by the authors, these are mainly related to different wheel radius

and wheel misalignment. In order to compensate those, the authors proposed a calibration technique,

named UMBmark, that allows the compensation of the systematic errors.

Non-systematic errors can be interpreted as random perturbations to a robot’s position. These are

usually associated to wheel slippage, wheel backlash, or uneven floor. In an attempt to mitigate its effect,

Wang [27] modeled the perturbations using the mean vector and covariance matrix to compensate its

errors. However, modeling the error is not a trivial task.

For that reason, more recently, Doh and Chung [28] proposed a method that estimates the error

parameters and covariance matrix. The method consisted on navigating the same path twice when sys-

tematic errors are fully compensated, being the difference of the two odometric paths the non-systematic

errors. The authors claim numerous advantages over previous techniques, namely: better results than

the previously most used technique, UMBmark; the model could be extended to all mobile states such

as the heading angle of the robot; and it can be applied to different kinematic systems.

2.1.2.B Visual odometry

Since the use of vision is practically mandatory for this challenge, as described in Section 1.3, it can

be use as an alternative to wheel odometry.

The concept of visual odometry is simple: estimate the motion from consecutively captured images.

Since it incrementally attempts to estimate the pose of the camera, similarly to wheel odometry, it has

been coined as Visual Odometry (VO). In the literature different types of cameras have been used for

this purpose. Their strengths and weaknesses are hereby resumed.

Classical techniques simply use a monocular camera [29]. However, since monocular cameras are

not aware of scale nor depth, the classical technique performs worse that the remaining candidates, so

it will not be focused.

12

Nister et al. [30] proposed a system that, contrary to the previous one, uses more than one camera.

With this approach it is possible to estimate the 3D position of the landmarks from consecutive images,

rendering it very accurate. The authors have shown accuracies of 1 % over the path’s length, which are

considerably accurate for a camera based odometry. One of the benefits of the visual odometry over

wheel encoders is that it is not affected by the typical wheeled odometry sources of error, namely, wheel

slippage and skid. However, having more than one camera renders the setup hard to calibrate and it’s

computationally heavier since their algorithm was capped at 13 Hz.

Steinbrucker et al. [31] presented a novel approach to visual odometry using Red, Green, Blue, and

Depth (RGB-D) sensors like the Microsoft’s Kinect. Since depth data is a directly output by the Kinect

sensor, this approach stands out by its great accuracy and its low computational effort, when compared

to previous stereo camera approaches. The authors accomplish these by means of energy minimization

to estimate camera motion between consequent RGB-D images.

2.1.2.C Inertial Measurement Unit

An Inertial Measurement Unit (IMU) uses a combination of sensors that allows the measurement

of actuation forces and angular velocities of a body. The sensors generally comprised within theses

units are accelerometers and gyroscopes. The combination of these two provides a reliable source

of orientation, however, since both sensors are relative, the orientation tends to drift over extended

periods of time [32]. This drift can be fully compensated for if an absolute source of orientation is

added. By measuring Earth’s magnetic field, a compass is capable of contributing with the necessary

absolute source of orientation and is commonly included on IMU to minimize the drift into an insignificant

value [33].

2.1.2.D Sensor Fusion

In the literature, the fusion of odometry sources, being those wheeled of visual based, with inertial

data is proven to be a more accurate approach than using the different types of data separately [32,34–

37]. This comes from the fact that odometry and inertial data are complementary of each other.

Even though the used sensors may differ, all authors have one thing in common: the use of the KF or

one its variants to mix different types of data. Due its small computational requirement, elegant recursive

properties, and ability to mix data at different frequencies, it is the most widely used algorithm for the

task of sensor fusion. These are already described in the Section 2.1.1.B.

13

2.2 Perception of the Environment

2.2.1 Mapping

A map is defined as the robot’s representation or knowledge of its surrounding environment. The in-

formation it carries is crucial for an autonomous vehicle to estimate its position relatively to the observed

world. It has been considered in [38] as one of the most important aspects to achieve truly autonomous

mobile robots. Due to its importance, different types of representations have been studied over time.

These highlight different aspects about the environment that are necessary for the robot to fulfill its mis-

sion. Likewise, to accomplish the navigation safely, it needs to represent not only ego-position while

also non-transposable objects’ position. In this section, the distinct ways of representing the knowledge

about the environment and its different variables are enumerated and hereby summarized.

2.2.1.A Types of representation

Topological maps are a semantic world representation, usually represented as a connected graph

[8, 39–41]. In this kind of representation, the nodes correspond to places of interest while the

connections between the nodes correspond to the places extremities. In order for a robot to localize

itself, it needs to uniquely identify the places via sensorial input. This representation allows an

abstract relation between node, which can reduce the complexity of trajectory planning tasks.

However, pure topological approaches avoid as much as possible geometrical or geographical

relations between nodes [42]. Without resorting to any form of metric information, performing

safe navigation on obstructed sites is nearly impossible [43], being this the major weakness of

topological maps.

Metric maps represent the environment as a geometric relation between objects and a fixed reference

frame. The geometric relation is metric, hence the name. Metric maps are further divided in two

main categories:

• Feature maps represent the environment with a collection of geometric shapes [44,45]. The

mapped shapes can go from corners or edges to more complex shapes like blobs. To create

the map, features are observed and compared to previously stored ones. Due to this fact,

Bailey [43] concluded that this mapping approach is very sensitivity to false data association.

However, since the environment is reduced to its descriptors and these hold very little infor-

mation - position and shape, for instance - feature maps are computationally light and efficient

when compared to other representation forms.

• Occupancy grids got its name by the way the environment is represented - in a grid. In the

classical occupancy grid approaches [8, 46, 47], the grid size is fixed. Within the grid, each

14

cell represents a region of the environment. These cells contain the probability of being occu-

pied, ranging from 0 to 1, representing unoccupied or occupied space respectively. Because

it directly carries information about the transposable and non-transposable space, this form

of representation is able to reduce the complexity of navigation and decision-making mod-

ules and has proven to be more resilient to misreadings than feature-based mapping [43].

The major setback of traditional occupancy grids is its trade-off between accuracy and com-

putational effort. Maintaining a large occupancy grid can be computationally heavy. Arleo

et al. [48], introduced a novel adaptive grid-size technique to mitigate the impact of storing

and processing high resolution grids. This way, cells can be kept at lower resolution. When

needed, each cells can be further subdivided in a hierarchical tree-based form. Due to its

form, it was later coined as quadtree [49] or octotree [50] for Two-Dimensional (2D) and 3D

environments respectively.

Appearance Maps describe the environment directly from raw sensorial data. The classical methods

manly focused on laser’s range data [51–53] but it was later extrapolated to vision sensors [54–56].

As metric approaches only use part of the sensed data to perceive the environment (features, for

instance), by using sensors’ direct output a much richer description of the environment can be

obtained [43]. The mapping characteristics of appearence based approaches are very similar to

topological ones. Instead of recognizing that the robot is inside of a place (topological map), in

appearance maps the pose of the robot is directly extracted from sensor data. To accomplish that,

a function to match data from data needs to be created. In a real scenario, creating this function

can sometimes be an impossible task. This is true, for instance, in long or symmetric corridors,

which is happens to be true in the competition’s track.

2.2.1.B Static vs Dynamic environments

Beside the form of representation, an environment is also be characterized by the number of state

variables. State can be seen as the collection of all aspects of the robot and its environment that can

impact the future.

An environment is defined as static when the only variable is the robot’s pose [43]. This means that

all environment’s objects will always be in the same place. In the other hand, when the environment’s

knowledge changes over time (either objects are inserted, removed, or moved), the environment is

defined as dynamic.

Most of the classical mapping techniques assumed the environment to be static. In real-world envi-

ronments, this assumption is very unlikely to be true. Wolf and Sukhatme [57] proposed a separation

between the static and dynamic objects on the map. Later, the same authors shown in [58] an online ap-

plication of this method to localize the robot in a static map and navigating without collisions by avoiding

15

the objects in the dynamic map.

2.2.2 Obstacle detection

In the literature, mainly two types of sensors are used for the purpose of identifying obstacles: cam-

eras and laser scanners. Vision-based sensors have proven to be a much cheaper alternative to LIDAR-

based ones and, due to the cost limitation, state of the art algorithms using laser scanners will not be

covered.

Since roads are approximately flat, obstacles are usually defined as objects that have height. As

such, non-transposable terrain can be identified by detecting objects that stand on the ground plane

but do not belong to it. In computer vision literature, non-transposable terrain is commonly obtained by

performing image segmentation of the different planes.

In [59], a monocular camera is added to a car to detect road obstructions. The proposed system

works in three steps: depth map computation, obstacle extraction, and data fusion. Since a monocular

camera do not posses depth information, the depth map is computed from multiple consecutive frames.

Then, by assuming that the car is moving on a plane, any object lying on it is an obstacle. By fusing the

odometry data with the depth information, an occupancy grid map is generated and the non-transposable

objects are added.

To decrease the complexity of the system and the computational burden of calculating the depth

map from consecutive images, Fleischmann and Berns [60] used a stereo vision approach. The authors

demonstrate the applicability of a point cloud to distinguish between soft objects that are found in the

road, like grass and weed, from dangerous solid objects that cannot be trespassed. It is achieved by an-

alyzing the density and distribution of the points captured and the normal of the fitted plane. Performing

neighborhood analysis to those clusters allow its classification - either obstacle or not obstacle.

In the same way, Holz et al. [61] also identify the planes correspondent to the navigable surfaces

while identifying its obstacles. In their research, they take advantage of the Microsoft’s Kinect sensor to

combine both Red, Green, and Blue (RGB) images with depth information to directly form a point cloud.

Kinect’s advantage comes from its cost, performance, and ease of use and configuration. Contrary to

stereo systems, it takes little to no effort in setting up and is computationally faster since the point cloud

does not need to be computed. Their research demonstrates how to segment different lanes to obtain

the navigable surface at frame rates up to 30 Hz.

Inspired in the latter work, Peasley and Birchfield [62] demonstrate the applicability of the Kinect’s

depth information to generate in real-time an occupancy grid with the transposable on non-transposable

terrain.

16

2.2.3 Traffic sign detection and recognition

Due to its applicability in real-world scenarios, a high amount of research for traffic sign detection

and recognition has been conducted. For the detection part, authors focus mainly in two aspects of the

visual information: colour and shape. These aspects are naturally focused as the colour is used to call

human attention and the shape is used for its categorization. In the recognition part, the detected signs

are then compared with the set of all the traffic signs for a potential match.

In this section, the literature on both color and shape-based detection types and recognition tech-

niques are reviewed.

2.2.3.A Color-based detection

Cameras commonly output visual information in the RGB color space. The earliest techniques [63]

instinctively make use of the direct output of cameras to segment the image into potential sign candi-

dates. These regions of interest are selected according to their color information. Since parts of the

direct output of the cameras are selected according to certain color thresholds, this technique is compu-

tationally light. However, the values of Red, Green, and Blue pixels depend on the intensity of the light,

rendering the technique very sensitive to changes in ambient light conditions.

To overcome this dependency, the RGB image is usually converted into a different color space. Other

color spaces that have been used in the literature are YCbCr [64], XYZ [65], and HSI [66]. Although there

is no de-facto standard for the color space used in the literature, the most commonly used ones are the

hue-saturation color spaces, HSI, HSL, and HSV, according to Fu et al. [67] survey. Its popularity was

attained because it is based on human color perception. Recent studies continue to show interest in

human color perception and even more complex color spaces are being utilized, like CIELAB, CIELUV,

and CIECAM [68]. The latter has displayed up to 8 % better results in different weather conditions when

compared to HSI.

Besides color-thresholding, region growing techniques [69] are also used to identify hypothesis re-

gions. As traffic signs are generally bounded by a color, this algorithm takes advantage of this and

attempts to identify closed colored shapes. If the closed condition is not met, the region is not consid-

ered.

Added to the weather and daylight conditions, outdoor environments have a high amount of occlusion

and reflections that can alter the color of the observed signs. Since color information cannot be retrieved

in some occasions, shape-based techniques have been studied as an alternative to color-based detec-

tion.

17

2.2.3.B Shape-based detection

Given that the geometric form of the traffic signs is well-known, many authors proposed detection

techniques based on their outline. The first attempt to extract the object features used the Hough Trans-

form [70]. This transform is capable of detecting regular shapes such as lines or circles. Since it uses

gray-scale images, its more immune to variations in the light. Yet, Hough Transform is computationally

complex and memory hungry [71].

To reduce the computational complexity, an edge detector pre-processing of the image is used. The

most commonly used edge detectors in the literature for this purpose are Sobel [72], Laplacian [73],

and Canny [74]. Sobel and Laplacian detect edges based on the gradient of pixels from the first and

second order derivate, respectively. Canny first uses a filter to reduce noise, then applies a Sobel, and

to calculated the best pixel for the edge it applies a on-maximum suppression in the local neighborhood.

Due to its computational complexity, many authors found alternatives to the Hough Transform.

Barnes et al. [75] have introduced a radial symmetry detector to detect Australian circular signs

in real-time. As the name implies, with this technique, the only regions of interest are the ones that

have radial symmetry. It was later extended [76] to other also detect rectangular shapes. This method

displayed 95 % accuracy and to be very tolerant to in-plane rotations of the signs, i.e., rotation in the

plane of the image.

To deal with scale variation Perez and Javidi [77] have presented a detection mechanism based on

the composite filter and the filter bank. Later, the same authors released a composite filter bank which

added the ability to detect signs with out-plane rotation, i.e., with rotations out of the plane of the image.

Although shaped-based methods have advantages over color-based ones, these also have their

own set of difficulties. As presented, to detect shapes edge detection and matching algorithms are

needed. This makes it difficult to detect shapes, when signs are distant or the image size is small.

Also, in cluttered scenes, lots of false positive shapes are detected, affecting the accuracy of the sign

recognition system.

2.2.3.C Hybrid Color-Shape-based detection

As the color and shape informations are complementary, it is natural that authors combine them. The

advantages of each have been applied in the most recent works to increase the overall accuracy.

Nakamura et al. [78] proposed a system that first segments the image with color information and

then performs a shape analysis. In their work, the color segmentation is performed in directly from RGB

images. Nonetheless, by applying a Neural Network to detect Ring shapes the authors claim 100 %

detection accuracy.

18

2.2.3.D Recognition

For the signs to be fully identified, after the shape and/or color is detected, the interior part of the

sign must be analyzed. This process is known as recognition. In the literature, the most commonly

used approaches for the purpose of recognition are template matching [74, 79–83], neural networks

[63,78,84,85], and, more recently, support vector machines [86–90].

Template Matching techniques guess the similarity between the identified regions and the sample

images provided beforehand. The templates are images or a derivate of it, e.g., histogram or

pattern. Typically a normalized cross-correlation method is utilized [79]. Due to its comparative

nature, this technique is very sensitive to the effects of reflection and occlusion. To mitigate this

effect, Paclik et al. [91] proposed a comparison of local features only. Despite this breakthrough,

as to the majority of template matching techniques, if the signals presents rotation, the templates

must also include the rotation for an accurate identification. The latter can create an overhead in

the time required to generate the templates as more rotated samples are required. Consequently,

the time to process through all of them increases.

Neural Networks are inspired in the biologic neural systems and have also proven to be an useful

alternative to the recognition task. When compared to template matching, it is capable of more

accurate results. Very recently, in 2017 Shustanov et al. [85] demonstrated the appliance of a

Convolutional Neural Network to achieve 99.94 % accuracy. Not only where the authors capable

of reaching almost 100 % accuracy while also doing it at 50 Hz. Even though the success rate

is very high, the training overhead represents a significant burden to the implementation of neural

networks, being its biggest disadvantage.

Support Vector Machines are supervised machine learning algorithms that are usually used for re-

gression or classification tasks. Since it is supervised, for the algorithm to properly work, it needs

to be trained. The training part generally creates a big overhead on the time of implementation

that is proportional to the number of signs to recognize. Nonetheless, when properly trained,

these algorithms have proven to be the most accurate classifier with 100 % accuracy [87]. Al-

though the results are astonishing, its major weakness when compared to other methods is the

achieved frame rate [85]. Only very recently, in 2012 [90], was it possible to achieve a real-time

implementation on a real dataset.

19

3System Architecture and Components

Contents

3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

20

System Architecture and Components

In this chapter, all modules belonging to the system are described: their functionality to the whole

system along with the proposal of architecture. It is also explained how the hardware and software are

interconnected and how these fit in the global picture. A high-level block diagram depicting the covered

modules and their connections is shown in Figure 3.1.

Figure 3.1: Proposal of functional architecture, including both Hardware and Software modules

21

3.1 Hardware

The hardware comprises the mechanical and electronic components that are arranged inside the

robot frame. These components have to be within the robot frame in order to be accordant with the

competitions rules [1]. Thus, given the limited available space inside the car-like robot, the constituents

of the car are chosen over three main characteristics: price, size, and performance/efficiency. Below,

each hardware unit that composes the autonomous vehicle is listed with the rationale for its inclusion.

3.1.1 Robot chassis

As described in Section 1.3, it’s opted for a non-holonomic robot to mimic as much as possible a

real-world scenario. The four-wheeled frame that is used in the competition is a 1:8 miniature of a real

car and is depicted in Figure 3.2. It has the steering system on the two front wheels, wheel suspension,

and a mechanic differential per axle.

Figure 3.2: Robot chassis overview.

The chassis was kindly donated to us and is originally from a Radio Controlled (RC) car with a gas

motor. To convert the old gas motor to electric one and support the hardware, numerous 3D printed

parts were created. All of those parts were developed in the context of the other parallel thesis while this

thesis focuses on the integration of all sensors, which will be described further below.

22

3.1.2 Actuators

For the robot to achieve its basic functionalities while also acting like a real car, a set of actuators

were added to the frame, namely:

• Motor - To replace the old gas motor an electric motor was chosen. The electric motor was chosen

as it allows a more accurate speed control, it reduces the complexity of its control, and it’s more

energy efficient. The motor in use is a brushed Direct Current (DC) motor. When compared

to brushless triphasic motors, brushed motors spend less current and are also cheaper at the

expense of top speed. Nevertheless, since the brushed motor is capable of achieving the target

speed of 2 m/s, one is incorporated in the robot.

• Steering - Contrary to conventional motors where the signal controls velocity of the rotor, servo

motors control the position of the rotor according to the input. For that reason, the use of servos for

direction control is standardized across the majority of non-holonomic frames. Since the chosen

chassis already included a support to hold a servo, the literature standards were followed.

• Braking - Again using a servo, by controlling the position of the mechanical parts, it’s possible

to lock or unlock a free-spinning disk that is mounted on the transmission. In the same way as

the steering servo, this is a standard across the majority of the radio-controlled car-like frames

and since the mechanical part is not the focus of the thesis, the standards were followed. The

necessary mechanical disks and support to add to the transmission had to be bought separately,

as those were not included in the robot chassis.

• Lights - Similar to a real-life scenario, when moving under diminished light environments, the

driver is obliged, by law, to turn on the lights. This way, it can clearly visualize the road ahead while

warning other drivers about its presence. To address this problem having a real-world situation

in mind, a pair of lights were included in the robot frame. Light Emitting Diode (LED) lights are

chosen over halogen ones as those are more energy efficient and provide enough luminosity for

the cameras to work inside a tunnel without any source of light, besides the robot’s lights, inside it.

3.1.3 Sensors

• RGB-D camera - To have both color and depth information an RGB-D sensor is used. Its use is

twofold: it’s easy to extract the road’s landmarks from the data of a colored image and its depth

information can be used to get an accurate position of the landmarks. Depth can also be used to

segment the colored image’s road from its impassable obstacles and the color of the objects used

to recognize them. A Microsoft’s Kinect One was added to fulfill this role due to its cheap price,

23

its depth accuracy, and low calibration complexity when compared to other RGB-D cameras and

even stereo cameras.

• Wide-angle camera - Since Kinect’s field of view is fairly limited - 70 º in width by 60 º in height,

according to Microsoft’s documentation - a wide-angle lens camera was used to capture the

semaphores and traffic signs. The IDS camera that is currently being used for this purpose, was

previously used as the only vision sensor. To fill the lack of field of view, this camera is reused.

• Stereo Camera - Due to its characteristics, a ZED stereo camera is used as the main sensor to

estimate the pose of the vehicle. Using their proprietary VO algorithm, the creators specify a very

high accuracy of up to 0.1 % of the traveled path. Since the robot chassis in use has a suspension,

the installation of wheel odometry is a very complex task. Additionally, although an encoder could

be coupled to the motor, the gaps on the mechanical axle responsible for turning the wheels can

cause a difference on the angle of the wheel of up to 5 º for the same steering servo angle. Since

the mechanical component of the robot is not the focus of this thesis, wheel odometry is discarded.

• Inertial measurement unit - As described in Section 2.1.2.C, to complement the information pro-

vided by odometry (being it visual or wheel odometry) an inertial measurement unit is also incor-

porated in the robot. It comprises a 3-axis gyroscope, 3-axis accelerometer, and even a 3-axis

magnetometer. This combination provides an accurate and cheap way of measuring the robot

orientation according to the world. The selected IMU is MPU-9250 as it is considered one of the

most accurate sensors for its price range.

• Infrared sensors - As in the most recent parking systems that inform the driver about the distance

to obstacles when the vehicle is moving backward, three infrared sensors are placed on the rear of

the vehicle. These cover the needed angles for the car to perform the parking maneuvers without

incurring into any collision.

3.1.4 Hardware interface

• Microcontroller - To make the bridge between the low-level sensors and actuators - basically all

hardware units descript with the exception to the cameras - and the computing platform where the

sensor data is processed and navigation commands are calculated, a microcontroller is used. Due

to its low-price, open-source community, built-in programmable port, and high amount of memory

an Arduino Mega was chosen over the remaining microcontrollers.

• Microcontroller shield - A custom Printed Circuit Board (PCB) is designed to facilitate the con-

nection and even support some of the sensors, actuators, and power regulators. The circuit directly

mounted over the Arduino pins, which is known in the Arduino community as a (Arduino) shield.

24

The advantages of using a PCB are threefold: it has the necessary connectors for sensors and ac-

tuators to be easily plugged or unplugged, if needed, it holds and connects the smaller electronic

components, and it regulates and distributes the battery energy to the components that require an

external source of energy.

• Motor driver - As previously stated, a DC motor is used to make the robot move. The majority

of microcontrollers typically only work with 5 V or 3.3 V logic pins that draw very low currents.

As the motors work in higher voltages and require a high current draw, a driver to adjust the

microcontroller signal output to the required motor input signal is used.

3.1.5 Computing platform

A mini-ITX format computer is specifically added to fit the space and cost requirements. Compared

to a laptop, this minimized form computer can reduce the overall cost and space requirements without

compromising performance. In fact, since laptops have lots of components that are not needed for the

competition, e.g., keyboard, screen and body frame, the monetary resources can be applied where

needed. As the key factor of the competition is speed, the resources are allocated to meet the high

performance requirements. This includes not only a performant Central Processing Unit (CPU) but

also a performant Graphics Processing Unit (GPU). The idea is to exploit the graphics cards’ number

crunching properties to tackle the most time-consuming algorithms, either for localization, perception of

the environment, or navigation purposes.

3.2 Software

The sofware comprises the operating system and modules developed to run on it. A high-level

description of the goal of each modules and their interconnections is provided since its implementation

and technical description is provided on the next chapter, Chapter 4.

3.2.1 Operating System

On top of the computing platform the Linux-based ROS is used. Although it only has 10 years of

existence [92], it’s one the most developed and used robotics framework nowadays. The framework’s

main points of interest are its distributed and modular design, simulation capabilities, and its open-source

community.

Its modular design allows each team member to focus on the development of its own modules with-

out having to implement the communication mechanism between those modules. The built-in visualizer,

25

ROS Visualization (RViz), and simulator, Gazebo, greatly contribute to the rapid development and de-

bugging of the implemented algorithms. These provide bootstrapping capabilities to both starters and

experienced users.

3.2.2 Software model of the robot

ROS also supports having customized 3D models of the robot. Although it has been developed

by Francisco in [6], it’s of especial relevance in this thesis given that it provides the transformation

between each sensor and the robot’s chassis, typically denominated as base link frame, and enables

the visualization of the robot in RViz. The base link frame provides a coordinate system that is fixed on

the robot. The sensors, in this case, have a fixed position in relation with the base link. The transforms

between the base link and the sensors are created using the second generation of transform libraries

of ROS, tf2 [93], and can be visualized in Fig 3.3.

(a) Visual representation of the base link ’s transform to thesensors and actuators visualized in RViz.

(b) Graph representation of themap, odom, and base link ’stransform.

Figure 3.3: tf2 transforms of the designed robot.

To indicate the absolute position of the robot in relation with the map, two additional transforms

are provided: odom and map. The odom frame provides the relative pose estimation transformation

between the base link and what the odometry sources perceive. The map transform represents the

correction of the drift of the odometry sources providing an absolute pose of the base link regarding a

26

global coordinate system, generically known as map.

3.2.3 Software modules

A high-level description of the software modules represented in Fig 3.1 is provided in this Section.

3.2.3.A Sensor calibration parameters

In order to reduce the overall uncertainty associated to the sensor’s measurements, a dedicated

module to calibrate the sensors and discover its intrinsic characteristics is created. The calibration

is to be performed before the competition, however, the calibration information is to be used during the

execution of the assigned tasks to ensure optimal accuracy of the sensors. These calibration parameters

are to be inserted directly on the microcontroller and as a configuration file to the software drivers.

3.2.3.B Ground, lines, and obstacles detection

With the sensors calibrated, a point cloud is output from the RGB-D sensor, the Kinect One. The

point cloud is, as the name implies, a set of data points spread in 3D space. From this point cloud, the

points belonging to the floor, where the robot is able to move, is segmented from the obstacles standing

on the track, where the robot is not able to move.

On this segmentation block, two laser-scans with distinct functionalities are output, namely:

Virtual Walls - By analyzing the floor’s colors using image processing techniques, the lines of the track

are identified and output as a laser scan, which can be thought as virtual walls.

Obstacles - Objects within a certain height are detected and output as a typical laser scan.

To fulfill these tasks, this module is split into three nodes: the segmentation of the RGB-D output, the

generation of virtual walls from the rack’s lines, and the detection of the obstacles.

The virtual wall output is provided to the Global Localization and Mapping module to enable the usage

of Occupancy-Grid based methods, such as MCL, while the detected obstacles are directly provided to

the navigation and decision-making modules to decide the next robot’s actions.

3.2.3.C Relative Pose Estimation

By combining all the available state estimation sources, relative or not, that are able to estimate the

vehicle’s pose, a corrected and more accurate source of odometry is provided. In the present work,

the sources available for pose estimation are the stereo camera, ZED, and the IMU, MPU-9250. The

corrected pose estimate is then provided to the Global Localization and Mapping module to carry its task

with a higher degree of certainty on the robot’s absolute pose.

27

This module also provides the dead-reckoning transform between the base link and odom frames.

3.2.3.D Global Localization and Mapping

With the information of the current pose and the virtual walls generated by the novel algorithm it is

possible to model the environment and localize the robot within the created map.

This module is divided in two nodes. The first generates a map from the observed features by

correcting the odometry path applying loop closure techniques, which consists on recognizing when a

previously visited location is revisited. The second node utilizes the generated map and the aforemen-

tioned inputs to provide an absolute localization of the robot in relation with the map.

The mapping node is to be used before the competition to store the identified features of the envi-

ronment so that it can be used during the execution of the challenges. This enables a higher scalability

of the developed system, as it is able to generate virtually any environment,

The localization node implements a multi-modal method to estimate an absolute robot pose in rela-

tion with the global map.

3.2.3.E Semaphore and Traffic Sign Recognition

Using the wide-camera image input, the surrounding environment is analyzed by this module to

identify the presence of semaphores or traffic signs. Since traffic signs can be found anywhere on the

track but the semaphores are only found near the zebra-crossing, the module is split in two smaller

blocks, one for each use case, allowing the segmentation and parallelization of tasks. This is especially

important to save computing resources as these are to be executed on demand.

28

4Implementation

Contents

4.1 Hardware Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2 Sensor calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Ground, lines and obstacle detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.4 Relative Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.5 Global Localization and Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.6 Semaphores and Traffic Signs Detection and Recognition . . . . . . . . . . . . . . . 48

29

Implementation

The practical implementation of each of the modules displayed in Fig 3.1 is outlined on this Chapter.

This includes the development of the hardware components that are required to power and interface

with the low-level sensors and a thorough description of the ROS packages created and used to fulfill

the system’s requirements.

4.1 Hardware Interface

As shared on Section 3.1.4, an Arduino microcontroller is used to extract the data from the low-

level sensors and command the actuators, and a circuit board is designed to organize and facilitate the

connections while supporting the smallest components. In this Section, the microcontroller interface and

the needed circuitry to interface with the hardware are detailed.

4.1.1 Power regulation and interface with small sensors and actuators

A custom PCB is designed to provide an easy interface for all sensors and actuators and to distribute

the power to all electronic components inside the robot’s chassis, with the exception of the of the cameras

which already possess a standard Universal Serial Bus (USB) connection, and the computing platform

and motor which have its dedicated power supply and driver.

The Eagle software is used for the PCB design since it’s used by a big share of the existing electronics

companies, and there’s an extensive number of freely distributed components.

As it’s depicted in the Fig. A.1 of Appendix A, the PCB has the same size and format as the mi-

crocontroller, the Arduino Mega. It was decided as such, so that it can be directly mounted over the

microcontroller, saving up space and requiring less physical cabled connections between the two.

The constituting blocks of the PCB are presented in Fig. A.2 of Appendix A. The blocks’ functions

and implementation details are described below:

• Named interfaces inscribed on the board with standard flat cable connections that cannot be re-

versed. This not only facilitates the assembly of the hardware while it also prevents those from

30

being incorrectly connected and, consequently, avoiding damage to the sensors and actuators.

• As the battery working voltage varies from 21V to 24 V and the majority of the hardware compo-

nents work at lower voltages, power regulators are used to regulate the battery input. The Texas

Instruments LM338T is used due to its capability of outputting more than the predicted current

usage, 5 A, it’s price, and simplicity.

• Support and incorporation of the smaller sensors and actuators, creating a direct connection to the

microcontroller and saving up in space and number of cabled connections. The smaller electronic

components directly incorporated in the board are: the MPU9250 and its logic level converter and

a light switch.

The MPU9250 needs a logic level converter to convert its 3.3 V logic to the 5 V logic of the

microcontroller.

The light switch is implemented with a NPN Darlington Transistor Pair, since it provides a high

current gain of the controlling signal of the microcontroller, which is limited to 50 mA, to enable the

1 A LED. Also, the switch is only enabled at the logical level ’1’. This is especially important when

the sensors are already connected to the board but the microcontroller is not yet started, i.e., the

lights would always be turned on while the microcontroller is disabled or initiating since its outputs

are at the logical level ’0’.

With the sensor and actuators connected to the microcontroller, the sensor values are extracted and

the actuators commanded using an Arduino program.

4.1.2 Bridging between the low-level hardware and the computing platform

As described in 3.1.4, an Arduino Mega is utilized to make the bridge between the lower-level sensors

and actuators and ROS.

To accomplish this, the microcontroller is always iteratively reading each sensor value and sending

it via UART to the computing platform. However, when a message is received via UART from the

computing platform, Arduino stops its current execution and applies the most recent command to the

actuators. The connections and protocols used between each hardware unit, the Arduino, and the

computing platform are detailed below:

• Inter-integrated Circuit (I2C) is the protocol used by the MPU-9250 to communicate. The protocol

requires two wires, one for the clock and other for the data, and provides a synchronous channel

of communication between multiple master and slave devices. The master devices drive the clock

signal and request data from the slave devices, while the slave devices simply respond to the

requests performed by the master. In this case, the Arduino is the only master device and requests

31

the values of the slave device’s sensors, namely, the gyroscope, accelerometer, and compass of

the MPU-9250.

• The Universal Asynchronous Receiver-Transmitter (UART) or Serial, as it’s known in the Ardudino

environment, protocol is used by the Arduino to interface with a computer. Arduino has a built-

in USB-to-Serial chip, enabling the connection to a computer via the standard USB protocol. In

this communication channel, Arduino sends the sensor values to be interpreted and receives the

actuators commands.

• Generic Input/Output (I/O) are programmable pins whose function is defined on the microcon-

troller’s initiation routine. The three different functions that these programmable can assume de-

pends on the hardware I/O too. For that reason, the hardware components using these pins are

listed:

– The range sensors are connected to an Analog pin, pin with an Analog to Digital Converter

(ADC), since these sensors provide a value that ranges from 0 V to 5 V .

– To send the velocity and angle command to the motor and servos, respectively, Pulse Width

Modulation (PWM) pins are used. PWM is a technique to achieve a simulated analog value

from digital signals. In digital signals, the signal only assumes two discretized levels, ’1’ or

’0’. This techniques simulates an analog signal by switching the logical value at very high

frequency while controlling the time that the signal is at the logical level ’1’. With this, it’s

possible to output a simulated analog value that ranges from the voltage of the logical level

’0’, which is 0 V , to the logical level ’1’, which corresponds to 5 V .

– Given the simplicity of the light switch, a simple Digital pin is necessary to turn the lights on

or off. This is accomplished by sending a signal with the logical level ’1’ or ’0’ to urn the lights

on or off, respectively.

To make the most out of ROS capabilities, the Arduino uses the rosserial [94] to received and send

ROS’ messages directly from or to a ROS node via its Serial port. These messages are standardized

and encouraged by the ROS community, hence, the standards are followed. The messages exchanged

by the microcontroller and computing platform, where ROS is installed and the nodes are executing, are:

• sensor msgs/Imu holds the values of the accelerometer and gyroscope converted from MPU-

9250’s raw values to m/s2 and rad/s, respectively, and the absolute orientation provided by the

compass expressed by a quaternion.

• sensor msgs/Range carries data of a single infrared range sensor converted to m.

• gometry msgs/Twist contains the commands provided by the Navigation and Control modules

and that are applied to the actuators.

32

4.2 Sensor calibration

To allow the correction of systematic errors and reduce the overall uncertainty associated with each

sensor measurement, each sensor is calibrated and its intrinsics are stored. Not only does calibration al-

lows the reduction of uncertainty while also enabling the conversion of the raw values to the International

Organization for Standardization (ISO) standards, which are used in the ROS messages as described

in Section 4.1.2.

The sensor’s calibration is done at two different levels. For the sensors connected to the microcon-

troller, an Android calibration program is available. The sensors directly connected to the computing

platform, namely, the cameras, have their own methods for calibration embedded on the drivers. The

calibration methods are described in detail in the following subsections.

4.2.1 Low-level sensors calibration

To accelerate the low-level sensors’ calibration an Arduino calibration program is created. It provides

two different methods to calibrate the sensors involved, namely:

• a semi-automated way to calibrate the IMU’s accelerometer, gyroscope, and magnetic compass.

It’s semi-automated since it’s necessary to excite the IMU in all axis, by moving the robot in an

eight-shaped figure. After the calibration process, the calibration values obtained are exported to

the Arduino code to compensate for the IMU’s sensors bias and scale factors.

• a semi-automated way to export the raw values of the infrared range sensors captured on different

conditions which allow for a quicker conversion of these raw values into ISO units. Since different

materials have different reflection coefficients, the calibration program accelerates the process of

collecting data as it allows the user to set the distance and the program outputs the values for the

user.

The source-code of the program can be found on the project’s GitHub repository [95].

4.2.2 Kinect sensors calibration

Regarding the camera calibration, Microsoft’s Kinect is the only sensor requiring calibration. The

wide-angle camera’s sole purpose is to provide an image to detect traffic and vertical signs which doesn’t

require knowing the camera parameters in the first place. As for the stereo camera, the ZED, it is factory

calibrated, requiring no additional attention.

Since the Kinect One possesses two different sensors that are combined to form a single RGB

image with Depth information, the RGB and Infrared (IR) sensors’ intrinsics are calculated individually

and, then, combining both. The calibration program is already present in the Kinect’s ROS driver, the

33

iai kinect2 [96], which is based on the calibration methods provided by Open Source Computer Vision

Library (OpenCV) [97]. The sensors’ intrinsics are described using the pinhole camera model. With this,

it’s possible to correct the misalignments from the RGB and IR grid, generating an output point cloud

with less error.

4.3 Ground, lines and obstacle detection

Using the Kinect’s point cloud output, the ground plane is segmented from the surroundings of the

environment. From the ground point cloud, the road lines are extracted and considered as virtual walls

to enable the usage of Occupancy-Grid based mapping and localization algorithms. For the remaining

point cloud, obstacles with height are detected passed to the navigation module.

In this section, it is explained the current implementation of the nodes that convert the Kinect’s

sensor msgs/PointCloud2 to two sensor msgs/LaserScan, being one the virtual walls generated from

the road lines and the second the detected obstacles. To accomplish this, three nodes are used, namely,

the point cloud segmentation block, the line extraction block, and, finally, the obstacle detection block.

The first two blocks are developed for the present master thesis and, for that reason, the algorithm

behind those is thoroughly described in the following sub-sections.

4.3.1 Ground plane segmentation

In line with what is described, this node is responsible for identifying the points of the Kinect’s point

cloud belonging to the ground plane, where the track is, and the remaining objects.

4.3.1.A Algorithm

The implementation of this algorithm is heavily supported on the Point Cloud Library (PCL) [98] and

is a result of the following sequence of operations:

1. Given that the Kinect’s coordinate system is not aligned with the base link coordinate frame, the

PointCloud undergoes a frame transformation first. This transformation not only keeps the coor-

dinate frames consistent and, consequently, renders the code easier to read and understand but,

most importantly, because the virtual laser scan needs to be provided in the robot’s frame.

2. After that, a Pass Through and Voxel Grid filters are applied to downsample the input cloud, using

pcl::PassThrough and pcl::VoxelGrid classes, respectively. Each filter is further described below:

• The Pass Through filter, as the name implies, removes all the points that are not within the

specified limits of the filter. Since the Kinect’s maximum range is specified at 5m, points

further than that distance are removed.

34

• The Voxel Grid filter divides the space of an input point cloud in 3D boxes, also named, voxels,

where size of the box can be defined. In each box, all points inside it are replaced by their

centroid and average color. This filter is used not only to downsample the point cloud but also

to keep a more even density distribution of the points, since the density of points drastically

decreases the further the distance.

These operations heavily reduce the number of points and, consequently, the computation com-

plexity of the tasks carried ahead.

3. To identify the ground plane, all the point planes’ normal vector coefficients are estimated and the

angle between the detected plane and the ground is calculated iteratively. The computation of the

plane’s normal vector coefficients is carried by the the pcl::SACSegmentation class.

The class uses robust estimators to compute the parameters of a mathematical model from a set

of observed data that contains outliers. To check for the presence of the ground, the algorithm will

iteratively perform the following steps:

(a) Starting from the biggest plane of the input data set of points, RANdom SAmple Consensus

(RANSAC) estimates the planes’ normal vector coefficients iteratively, removing the outliers

that are not within a defined distance threshold. With the plane’s coefficients estimated, the

angle, θ, between the detected plane’s normal vector, np, and the XY plane’s normal vector,

i.e., the ground plane’s normal vector, ng is calculated using the equation:

θ = arccos

(np · ng|np| · |ng|

), (4.1)

where |np| and |ng| represent the magnitude of the detected plane and ground plane’s normal

vectors, respectively. The magnitude of a vector, v, is calculated by resorting to the equation:

|v| = 2

√v2x + v2y + v2z , (4.2)

where vx, vy, and vz are the x, y, and z components of the vector, respectively.

(b) If the calculated angle, θ, is within a user defined threshold, the detected plane is considered

as the ground plane. Else, the computed plane is removed from the initial data set and a new

plane estimation is performed.

(c) This process repeats until a user-defined percentage of the original point cloud data is pro-

cessed. If this percentage limit is reached without the angle condition being met, it’s consid-

ered that there is no ground plane and the algorithm waits for the next point cloud to arrive.

35

4. When the ground is detected, the inliers of the ground are extracted using the pcl::ExtractIndices

class, which simply extracts the identified inlier’s indices from an input point cloud to a new point

cloud. The remaining indices from the original point cloud are considered the obstacles within the

track. These two point clouds are the output of this block, representing the detected ground and

obstacles.

The different point cloud transformations are visually represented on Fig. 4.1 using RViz.

(a) Kinect’s output point cloud. (b) Downsampled point cloud.

(c) Segmentation of the ground point cloud, repre-sented in white, and the obstacles point cloud, rep-resented in green.

Figure 4.1: Visual description and representation of the ground and obstacles segmentation algorithm.

36

4.3.1.B User-Defined Inputs

• Leaf size - The size, in meters, of the side of the cube used by the Voxel Grid filter.

• Ground detection angle - This parameter represents the angle threshold between a point cloud

detected plane and the ground plane for considering the identified plane as the ground.

• Processed cloud points - The percentage value, ranging from 0 to 1, of step 3c to consider the

original point cloud processed. If the value equals 1, the stopping condition is only met when all

points of the cloud were processed. On the hand, a value of 0 will cause the algorithm to only

consider the plane with more points and, consequently, there will be no loop.

• Estimation distance - Represents the distance threshold, in meters, used by RANSAC for a point

to be considered an inlier of a plane during the estimation.

4.3.1.C Implementation Details

The voxel grid’s implemented leaf size is 0.025 m. Since the lines have a 0.05 m thickness, the

downsampled point cloud will have two points, which is enough to detect a line. With this size, the

number of points is heavily reduced, drastically increasing the speed of the algorithm. The same size of

the leaf is used on the RANSAC’s estimation distance so that it only considers a point as an inlier if the

same point is, at most, one voxel away from the estimated plane.

The present implementation uses RANSAC as the robust estimator since it’s simple and able to

deliver with accuracy and speed. The SACSegmentation class allows other robust estimators, however,

those are based on RANSAC and are more complex. RANSAC provides the desired performance,

hence the decision to use it.

Since the robot chassis in use has a suspension, it’s possible that the cameras is subject to oscilla-

tion, thus, increasing the angle between the camera and the ground plane. To accommodate this and to

keep a safe margin, a detection angle threshold of 5 º is utilized.

4.3.2 Road lines extraction

Using the ground point cloud, a scaled image is created and processed to extract the track’s lines.

PCL does not have the necessary color processing tools to achieve this and, since the ground point

cloud is within a 2D surface, like an image, the current implementation supports on OpenCV instead.

4.3.2.A Algorithm

In order to transform from the input point cloud to a virtual laser scan, the below operations are

performed:

37

1. The minimum and maximum values of the x and y axis from the input cloud are stored to enable

the conversion from the robot’s coordinate frames to the image coordinate frames. Also, this allows

the calculation of the minimum width and height to store the image with no margins.

2. With those numbers stored, a scaled image is created, where each pixel contains the information

of a single voxel from the ground segmentation block. In fact, when the Kinect’s point cloud is

downsampled, a single color point is found on each voxel, thus, the ground point cloud is already

organized as a grid. Each pixel of the scaled image stores the RGB information of each voxel and,

although those are in different coordinate system, those be converted from the different coordinate

systems using the equation:

Iy =Px

ps−Rx,

Ix =Py

ps−Ry

(4.3)

where Ix and Iy represent the image’s coordinate frame for the x and y axis, the Rx and Ry

represent the values of the robot’s coordinate frame for the x and y axis, the Px and Py are the fur-

thermost points of the cloud on the x and y axis, respectively, and pS is a constant that represents

the size of each pixel, in meters, which is the same as the size of the voxel.

3. Then, the created image is converted from the RGB color space to CIELAB color space since it

allows the separation of the luminosity component of the image from the color components. The

advantages of using CIELAB color space is twofold:

• Since the colors have its own dedicated channel, contrary to RGB, it’s easier to create a color

filter to only detect the white color, which are the colors of the road’s lines.

• As the luminance component is in a dedicated channel, the color channels vary much less

when the lightening conditions change when compared to RGB color space. This way, the

resilience of this algorithm to different lightening conditions also increases.

By applying the color space conversion and color filter, a binary image is generated. In it, white

pixels are considered as road lines and the black pixels are considered as lanes or free space.

4. With the color filter, both continuous and non-continuous lines are captured. Since the robot can

move over non-continuous lines, those should not be detected as virtual walls. To remove those

from the binary image, all objects’ areas, in pixels, are calculated. This area is calculated using

OpenCV’s findCountours function. If the area of the object doesn’t reach a minimum threshold

defined by the user, it is removed from the image.

38

5. With only continuous lines on the image, to conclude the process, the virtual laser scan is created

by generating a number of virtual beams that start from the car’s origin, base link ’s (0, 0) point,

until those virtual beams reach a white pixel. To calculate the pixel’s x and y components on the

car’s coordinate frame, the Eq. 4.3 is used. Then, the magnitude of the beam, which corresponds

to the effective distance of the pixel to the robot, is calculated using the vector magnitude equation

already shared in Eq. 4.2 assuming that the z component is 0. The number of virtual beams

depends on the defined field of view of the sensor and angle increment that are input by the user.

If no white pixel is found, the laser beam is set to the value 0.

The described procedure’s steps can be visually observed in the Fig.4.2.

4.3.2.B User-Defined Inputs

• Angle increment - The angle interval, in degrees, between each virtual laser beam. The lower

the value, the more beams are generated for the field of view of the Kinect.

• Line minimum area - The minimum area, in m2, to consider a line continuous.

• Luminosity range - Since the only variable that can affect the color filter is the lightening condition

of the environment, as the color of the line doesn’t change, it is set as a user parameter. It

represents CIELAB’s luminosity range values that are applied on the color filter to detect the line.

4.3.2.C Implementation Details

As the luminosity threshold depends on the conditions of the environment, to visualize the effect of

the luminosity range and calibrate it, a script is created. As depicted in Fig. 4.3, the script allows the

user to change the range and visualize the output live. In the same figure, it’s also demonstrated that, to

detect the white line, the luminosity has to range values from 130 to 255.

Regarding the detection of the continuous line only, as the size of each dash of the dashed lined

is fixed, being 0.2 m by 0.05 m, the minimum area of detection can be directly calculated. Since the

resulting area is 0.01 m, the minimum area is set to 0.015 m to include a safe margin.

From the results obtained on the Chapter 5, it’s demonstrated that an angle increment of 1 º is enough

to provide the expected results in terms of accuracy.

4.3.3 Obstacle Detection

With the output of the ground segmentation block, a point cloud containing only the impassable

objects that are standing on the track is processed to output a laser scan message. With this, two

distinct objects of the competition are detected, namely, the lane obstacles, which are green boxes that

39

(a) Ground segmented point cloud visualized in Rviz. (b) Scaled image generated from ground point cloud.

(c) CIELAB color filter output image. (d) Removal of the dashed line and noise.

(e) Visual representation of the virtual beams in red. (f) Ground segmented point cloud and virtual laserscan output, in red, visualized in RViz

Figure 4.2: Visual description and representation of the generation of the virtual laser scan from an input coloredpoint cloud.

occupy the full width of a lane, and the in-construction zone’s cones. Both are output as a laser scan

message as this is the requirement imposed by the navigation and decision-making modules created on

a parallel thesis [6].

To accomplish this, the point cloud is iterated to identify objects that are within a specified height

and range from the moving robot. Although this can be easily achieved using PCL, since the point-

cloud to laserscan [99] package already implements the mentioned functionality, it is used instead.

40

(a) No filter applied (b) Forward Template

Figure 4.3: Script interface used for the calibration of the CIELAB’s channels range values.

4.3.3.A User-Defined Inputs

• Minimum and Maximum Height - Define the point’s minimum and maximum values, in meters,

for the z axis on which to search for obstacles. Points outside of this range are ignored.

• Minimum and Maximum Distance - Define the minimum and maximum radial distance on which

the points must be. Points outside of this radius are ignored.

• Angle increment - The angle interval, in radians, between each laser beam. The lower the value,

the higher the resolution of the output laser scan.

4.3.3.B Implementation Details

The obstacles’ height, according to the competition rules, range from 0.2 m to 0.3 m. Thus, the

search height is set to start from the ground, 0 m, and to end at the maximum height, 0.3 m. For the

maximum and minimum ranges, the specifications of the Kinect are applied, which correspond to 0.5 m

to 5 m.

From the experimental tests carried on Chapter 5, an angle increment of 1 º is enough to identify the

cones and obstacle tracks at the maximum range distance.

The output laser scan of the module is depicted on Fig. 4.4.

41

Figure 4.4: Visual demonstration of the output of the obstacle detection node, with red color, from the input obsta-cles point cloud, with greed color.

4.4 Relative Pose Estimation

As unveiled in the Section 3.1.3, cameras are used to estimate the pose of the vehicle instead of

wheel odometry using Visual Odometry techniques. However, both methods are based on a dead-

reckoning approach and are, thus, relative methods and subject to cumulative errors. To counteract

the errors associated to the visual odometry, besides calibration, it is combined with an IMU to provide

a more accurate and stable pose estimate. In this Section, the implementation of all modules that

contribute to the pose estimation are covered.

4.4.1 Visual Odometry

The ZED stereo camera is present on the robot frame with the main purpose of providing a reli-

able source of odometry. The creators of ZED, provide a SDK where the pose estimate is calculated.

However, the SDK is not open-source and, consequently, it’s not possible to examine the implemented

algorithm.

Nevertheless, they also provide a ROS wrapper, named zed ros wrapper, which encapsulates the

ZED SDK functions and enables its usage over ROS. The package provides a 3D pose and the pose

covariance.

4.4.1.A User-Defined inputs

The package provides a very limited set of parameters for one to tweak the obtained pose and,

consequently, its accuracy. The parameters that were confirmed to have influence on the pose accuracy

42

are:

• Resolution - The size in pixels of each camera image can be defined using this parameter. One

can opt from: 2.2K (2208x1242), 1080p (1920x1080), 720p (1280x720), or WVGA (672x376).

• Frame rate - The number of frames capture by each camera per second. Depending on the

resolution, the frame-rate varies from 15 Frames Per Second (FPS) to 100 FPS.

• Floor Alignment - When set to true, the VO algorithm will attempt to identify the ground and take

the ground into consideration on the pose estimate.

• Pose Smoothing - When set to true, the output pose is smoothed, i.e., discrete jumps of any state

variable of the pose of the vehicle with lesser intensity are ignored.

4.4.1.B Implementation details

Zed allows different combinations of resolution and frame rates depending on the application needs.

From the experimental tests carried, to obtain a good accuracy, higher frame rates are required espe-

cially at higher speeds. However, since the ZED combines image and depth information to track the 3D

pose of the camera, the higher the resolution, the more accurate is the depth information obtained and,

consequently, the pose. In order to maintain a good performance ratio between resolution and frame

rate, i.e., good accuracy both at lower and higher velocities, a resolution of 720p at 60 frames per second

is utilized.

Both Floor Alignment and Pose Smoothing parameters are enabled since those are demonstrated

to improve the output results. The Floor Alignment prevents the ZED from drifting on the z-axis, i.e.,

from estimating a upward or downward movement, since it always keeps track of the height to the

floor, drastically improving the accuracy. The Pose Smoothing prevents the pose from drifting when the

camera is almost steady, thus, improving the pose estimate when the car is stopped.

4.4.2 Sensor Fusion

To perform the fusion of all the available sensors capable of outputting a state variable, the robot localization

[100] package is used. It is used due to its numerous advantages: it allows an arbitrary number

of sensors; supports the standard messages used for this effect, namely, nav msgs/Odometry, sen-

sor msgs/Imu, and geometry msgs/PoseWithCovarianceStamped ; and provides a continuous state es-

timation even when any of the sensors doesn’t provide any information. To perform the state estimation,

an Extended Kalman Filter is used.

43


Given that the robot localization package allows an arbitrary number of sensors, even sensors that

provide the same information, the number of inputs and individual sensor configuration has to be pro-

vided via configuration file. On the present implementation, the relevant inputs are:

• 2D mode - Although the present sensor fusion algorithm uses a 3D omnidirectional motion model,

as the name of the parameter implies, it’s possible to ignore z, roll, and pitch state variables in

order to output a 2D pose. This effectively zeroes those state variables, which may be adequate

on situations where the robot is operating in a planar environment.

• Sensor’s State Variables Configuration - The robot localization package tracks a 15-dimensional

state of the vehicle (x, y, z, roll, pitch, yaw, x, y, z, ˙roll, ˙pitch, ˙yaw, x, y, z), where each sensor can

contribute to each of these state variables. To define whether a sensor contributes to a specific

state variable, a matrix with boolean values has to be provided. By defining a boolean value with

the value true, the sensor fusion algorithm will incorporate that sensor’s readings into that specific

state estimate variable, otherwise, it won’t. The state variable configuration matrix is in the form

presented in the equation: x y zroll pitch yawx y z˙roll ˙pitch ˙yawx y z

(4.4)

• Covariance of the state varaibles - By definition, the covariance represents the relation between

two variables. In practical terms, this is used by the sensor fusion algorithm to evaluate the confi-

dence or weight of the estimate provided by the sensor to a particular state variable. The smaller

the covariance value of a state variable from a sensor, the more confidence is provided to that

sensor’s estimate on that same state variable.

• Relative - If the sensor has an absolute source of pose estimation, this parameter should be set

to false. Otherwise, if the sensor is relative, i.e., estimates the pose based on dead-reckoning

approach. When set to true, the sensor fusion algorithm interprets the first value input as a ’zero’

value.


In the current competition track, the track is on a planar surface, thus, there are only subtle variations

in the ground plane. Since these are of small relevance, the 2D mode is enabled, effectively zeroing the

Z, Pitch, and Roll state variables.

44

Considering that the two sensors capable of providing an estimate to the state variables are the ZED

and the MPU-9250, those were individually configured.

For the ZED, the configuration matrix is set use the x, y, and yaw values. Since it’s based on a

dead-reckoning pose estimation approach, the relative parameter is enabled.

For the IMU, the configuration vector is set to use the yaw, ˙yaw, x, y values. Since the IMU has a

compass, which provides an absolute orientation regarding Earth’s magnetic field, the relative parameter

is disabled.

The only state variable that is provided by both sensors is the yaw. As will be shared in the next

chapter, since the IMU has a higher accuracy on the yaw estimate, the covariance of the IMU was

deliberately set to a lower value than the ZED.

4.5 Global Localization and Mapping

4.5.1 Global Map

To carry the task of creating an Occupancy-Grid map to serve as base for the localization and navi-

gation modules, OpenSlam’s GMapping [101] package is implemented as it was demonstrated in [102]

to be the most accurate algorithm when compared to its competitors.

Since the environment is unknown, it uses the Rao-Blackwellized PF to solve the simultaneous and

location problem. The key idea of this PF, is that both the trajectory, x1:t, and map, m, are estimated

based on the detected environment features, named observations, z1:t, and the odometry measure-

ments, u0:t−1, by taking advantage of the factorization presented in the equation:

p(x1:t,m|z1:t, u0:t−1) = p(m|x1 : t, z1 : t).p(x1 : t|z1 : t, u0:t). (4.5)

This way, it is possible to calculate the posterior of the map and trajectory in two distinct steps,

allowing an efficient computation. In this case, each particle corresponds to a map that is built from all

observations and potential trajectories.


For the GMapping algorithm to be as effective as possible on different scenarios, numerous param-

eters can be defined to adjust it to the present environment. The most relevant inputs for GMapping the

current implementation are:

• Map Resolution - The resolution of the occupancy grid defines, in meters, the size of each cell in

the output map. The smaller the size of each cell, the more granular is the output map.

45

• Map Size - The full length and width, in meters, of the map is defined by the difference between

the maximum and minimum values of the x and y axis.

• Particles - Since GMapping is implemented with a finite number of particles, the higher the num-

ber of particles defined by the user, the more posterior are computed and, potentially, the more

accurate is the output map.

• Odometry error - The odometry errors in translation and rotation are modeled inside GMapping

as a function of translation and rotation. Decreasing this value causes the algorithm to trust the

odometry more, computing trajectories closer to the odometric path.

• Matching score - A minimum score to consider the outcome of the scan matching valid can be

provided, avoiding jumps in the pose estimates in large open spaces, being especially importante

for limited range laser scanners with limited range, which is the case.


The current implementation defines that the size of each cell of the Occupancy-Grid is 0.025 m to

match the granularity of the downsampled point cloud that is provided to calculate the Kinect’s virtual

laser scan.

The provided length and width are 20 m and 10 m, respectively. As the track is 16.75 m by 6.95 m,

it is possible to fit it in there with some margin for error.

To increase the number of loop closure detections, the number of particles was increased from 30 to

100. It’s also observed that the increase in the number of particles increases the similarity between the

real and generated maps.

In the majority of the literature, where GMapping is also included, it is admitted that the odometry

error is orders of magnitude bigger than the error of the laser. However, in the present case, since the

laser scan is virtually created, this assumption is not true. For that reason, and since the odometry error

is quite small as demonstrated in Chapter 5, the expected odometry error is decreased to half while the

matching score increased to 100 to prevent incorrect matchings caused by the lack of accuracy of the

virtual laser scan.

4.5.2 Absolute Localization

In consonance with what is stated in Section 3.2.3.D, a multi-modal localization approach is required

in order to deal with the global localization problem and, to some extent, the kidnapped robot problem.

To deal with this problem, the AMCL approach of Fox et al. [20], which was already described in Section

2.1.1.C, is implemented using the amcl package [103].

46

Due to its capacity to change the number of particles depending on the approximation error or weight

of each particle, and its ease of implementation over a known map, AMCL is widely spread and approved

in the robotics industry. For all those reasons, the current localization module implements AMCL.


The ease of implementation comes also from the well structured and described parameters. The

ones that are more relevant for the current implementation of the localization module are:

• Initial Pose and Covariance - The initial pose represents the exact place of the map where the

robots starts the localization from while the initial covariance represents, in meters, the distribution

of each state variable regarding the initial location. The higher the covariance, the more spread

are the particles on the initial state.

• Particles - Since the number of particles is adaptive, it’s possible to configure both the minimum

and maximum particles used by the algorithm.

• Odometry type - The current implementation of AMCL considers two odometry models for the

moving robots, either omnidirectional or differential.

• Odometry alphas - The odometry alphas in AMCL allow modeling the odometry error by the user,

where the expected noise of odometry’s translational and rotational components are provided in

respect to translational and rotational motion of the robot. In practice, the higher these values are,

the more spread will the particles be in relation to the odometric path during the sampling phase.


From the competition rules, the autonomous vehicle is always started from the same location. With

this fact, the initial robot pose in respect to the map frame is provided with a small covariance value,

since the confidence is very high that the robot is exactly on that pose.

Conforming to the previous chapters, the built robot is non-holonomic and better defined by differ-

ential constraints rather than omnidirectional motion constraints. Thus, the selected odometry type is

differential. For the same reasons described in the GMapping node implementation details, Section

4.5.1.B, the expected error for the odometry alphas is redureced by a factor of four. For that same rea-

son, since the error is always kept small, the maximum number of particles is also decreased from 5000

to 200.

47

4.6 Semaphores and Traffic Signs Detection and Recognition

Given the simplicity of the semaphore signs when compared to the traffic signs and its importance

on the competition, as those are used to start every single challenge and provide instructions during the

execution of specific tasks, a pragmatic image processing method for the detection of the semaphores is

implemented. However, due to the number and complexity of the traffic signs, a neural network is used

to recognize the existing traffic signs instead.

4.6.1 Semaphore Detection and Recognition

Based on the other state of the art approaches, the implemented method uses a color-filter to perform

a preliminary detection and, then, a template matching technique is used to confirm the presence of a

semaphore sign.

The current implementation of this module is supported on OpenCV as it provides all the available

image processing functions to identify the semaphore shapes.

4.6.1.A Algorithm

Upon the request of semaphore recognition, the module performs the following operations:

1. The input image is first converted from the RGB color space to CIELAB color space for the same

reasons described in the Section 4.3.2.A.

2. After the color space conversion, the green and yellow colors of the original image are segmented

based on a user-defined input array. The pixels within the specified color values will be transformed

to white, while the remaining ones will be transformed to black, creating a binary image.

3. With the colors segmented, only the shapes of the green and yellow signals and arbitrary objects

with the same colors remain. The identified shapes will be submitted to the OpenCV’s template

matching function, matchTemplate, to validate the presence of a known shape. This technique will

output a similarity coefficient between the different signal shapes (arrow, cross, or parking sign)

and the image shape.

4. If the similarity coefficient between the known and identified shapes output by the matchTemplate

is greater than a specified value, the template semaphore shape will be considered as detected.

5. However, for a semaphore to be considered recognized and to increase the accuracy of the algo-

rithm, it will only output the semaphore shape detected if the same semaphore shape is detected

N times consecutively, where N is a user defined input. Otherwise, if no semaphore shape is de-

tected consecutively N times within a user-defined timeout, a timeout message is thrown instead.

48

4.6.1.B User-defined inputs

• Color range values - Represent the minimum and maximum values of the three channels of

CIELAB color space in the form of an array. These are used to detect the green and yellow colors.

• Path to the templates - The path to the folder where the template shapes are stored.

• Template names - The image filenames that correspond to each distinct semaphore.

• Similarity coefficient - A percentage value, ranging from 0 to 1, for two shapes to be considered

the same. The higher the value, the more alike the two shapes need to be for a detection to be

successful.

• Consecutive detections - Integer number of consecutive detections of a shape for it to be con-

sidered as recognized.

4.6.1.C Implementation details

Since image processing techniques are generally complex and computationally heavy, this module

is implemented as a ROS Service instead of a ROS Topic. This way, the node is not continuously

checking for semaphores and, instead, it only works on demand and when necessary. Provided that

the location of the semaphores is known, the navigation and decision-making modules, developed on a

parallel thesis, request the recognition of the semaphore based on the location of the robot, i.e., when it

is near the zebra-crossing.

Although there are red, green, and yellow semaphores, the current implementation only checks for

green and yellow semaphores. Red semaphores are deliberately ignored since the robot is stopped

when checking for the semaphores, thus, informing the navigation and decision-making modules that

the semaphore is red, i.e., that the car should continue stopped, is irrelevant as it will not change the

current state of the car.

Regarding the segmentation of the green and yellow colors, the range values that are provided by

the user can be also be obtained from the calibration script that is described in Section 4.3.2.C.

With the same script, it’s also possible to create the template files that are used on the template

matching step by saving the output binary and cropping the signal shape from the binary image. The

captured templates are depicted in Fig. 4.5.

4.6.2 Traffic Sign Detection and Recognition

To carry the task of detecting the traffic signs, the neural network developed by Redmon and Divvala

[104] is utilized. It’s chosen from other machine learning methods due to its real-time applicability, as it

is able of achieving output rate of 50 Hz on their last iteration [105].

49

(a) Left Template (b) ForwardTem-plate

(c) Right Template (d) ParkingTemplate

Figure 4.5: Binary templates used by the template matching method.

Their neural network is called You Only Look Once (YOLO) due to its method of operation. Contrary

to other neural networks approaches, where a region based classifier is used to search for an object

within an image several times, the authors of YOLO solve this problem in a single pass over the image.

This is accomplished by segmenting the image in a grid where, for each cell, a bounding box for an

hypothetical object is created and a class probability is estimated. By multiplying the probability of the

class with the confidence of the bounding box having an object inside it, a weight is calculated to each

bounding box. The weights are thresholded and, when above a specified value, the bounding box and

identified class are considered as detected.

4.6.3 Implementation Details

YOLO is implemented on ROS in the darknet ros package, being the only required inputs the trained

weights of each class and the class identifiers. The classes are easily trained by creating a bounding

box over a data set of collected images containing the required classes and providing the required time

to train it. By providing these inputs, the package outputs the identified classes along with a confidence

value on that detection being correct and the bounding boxes of where the object is on the original

image.

However, given that the traffic signs are all very similar within each class, being those classes:

informative, mandatory, and danger signs, and given that the wide-angle camera has a low resolution,

to increase the success rate of the traffic sign recognition, YOLO is implemented in two layers.

The first layer detects the class of the traffic sign while the second takes the bounding box of the

detected sign to recognize only the interior part of the sign. This approach can be visualized in Fig. 4.6.

50

(a) Detection of the traffic sign class Template (b) Recognition of the interior part of the sign

Figure 4.6: Visual demonstration of the YOLO implementation.

51

5Evaluation

Contents

5.1 Performance of the modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.2 Performance of the system on the competition environment . . . . . . . . . . . . . . 62

52

Evaluation

To grade the performance of each module and ensure that the described specifications from Table

1.1 are met, each individual module is separately assessed along with its contribution to the system

as a whole during the different competition challenges. To make this assessment, it’s evaluated in

the Autonomous Driving challenge of the Robotics Open and in a testing environment that mimics the

Robotics Open scenario. The two environments can be visualized in the Fig. 5.1.

(a) Testing environment (b) Competition environment

Figure 5.1: Comparison between the testing and the (real) competition environment.

ROS built-in functionalities, namely, its 3D visualizer tool, RViz, and its bag files that capture and

store ROS messages, are exploited both to visually assess the accuracy and to systematically test the

robot having always the same source data.

To ensure that the navigation and decision-making modules have both accurate and correct infor-

mation to plan a safe and fast route, the main characteristics on which each module can be evaluated

are:

• Accuracy of the obtained values.

• Computation cost and real-time applicability.

• Ratio between correct and incorrect operations.

53

All the output data displayed or described on this chapter is captured directly from the robot. To

recap, the robot has an Intel Core i7-7700 Quad-Core at 3.6 GHz CPU, 16 GB DDR4 Random Access

Memory (RAM), and a NVIDIA GeForce GTX 1050 Ti 4 GB GPU, on which all data was processed in.

5.1 Performance of the modules

For the evaluation of the distinct software nodes according to the determined evaluation characteris-

tics, a different environment setups are created to independently test and showcase the strengths and

weaknesses of each module. Unless specified, the input parameters of each module used on the tests

are the same values as described on Chapter 4.

5.1.1 Ground and obstacles segmentation

5.1.1.A Setup

To evaluate the performance of the ground and obstacle segmentation algorithm, it’s submitted to

two different scenarios, namely:

Scenario A - An obstacle is put on front of the vehicle at various distances to check whether there’s any

iteration where the segmentation algorithm doesn’t find the ground and, consequently, no output

is provided. The distance between the obstacle and the Kinect ranges from 70 cm to 5 m, the

latter being the maximum the specified limit by Microsoft. The specification from the navigation

and decision making modules is that neither part of the robot is ever closer than 30 cm from an

obstacle. Since the Kinect is pointed forward and the front of the vehicle is 40 cm away from the

camera, the total distance from an obstacle to the camera makes a total of 70 cm, rendering it the

minimum distance.

Scenario B - Two full laps on different directions are performed to assess if the segmentation algorithm

is affected by the motion of the robot.

On both tests, the number of iterations required by the algorithm to identify the ground plane and the

output rate of the segmented clouds is also assessed.

5.1.1.B Results

By capturing 10000 different pointclouds with distances ranging from 70 cm to 5 m, it’s possible to

conclude that the segmentation module’s algorithm is always able to identify the ground and in a single

iteration. Even on the worst case scenario, with an obstacle at 70 cm from the camera, the plane with

54

more points is always the ground. It’s defined as the worst case scenario because it’s at that distance

that the ground has less points, as it is possible to visualize on Figure 5.2.

Figure 5.2: Visual demonstration of the number of points belonging to the ground when an obstacle is at 10 cmfrom the car’s front

.

Upon the completion of two full laps, Scenario B, the 1033 different point clouds capture demon-

strated the exact same results as in the previous scenario.

Both scenarios demonstrate that the algorithm is capable of achieving an output rate of 12.4 Hz

when rounded to the bottom on the first decimal digit. Since the output rate of the Kinect’s point cloud

from iai kinect ’s package is 15 Hz, the loss is not significant. To put into perspective, some of the

cheapest real laser scan costs nearly 7 times more than the Kinect and are only capable of outputting a

laser scan at 10 Hz. The described results are summarized on Table 5.1.

Table 5.1: Ground and obstacles segmentation module performance on the different testing scenarios on 10000different point clouds for Scenario A and 1030 for Scenario B .

Scenario SuccessfulSegmentation Ratio [%]

Output Rate[Hz]

A 100 12.49

B 100 12.43

5.1.2 Road lines to virtual walls

5.1.2.A Setup

To assess the quality of the novel algorithm that generates a virtual laser scan from the identified

continuous road lines, it’s put to test on two different scenarios, namely:

55

Scenario A - Since the density of points of the ground point cloud output from the segmentation node

decreases with the increase in the distance to the Kinect, the number of points that belong to a

line also decreases. As a consequence, at a given point, the line is not detected. To identify the

distance at which a line is not detected, a line parallel to the car is observed in RViz.

Scenario B - From distances ranging between the identified maximum distance to the minimum dis-

tance, a line perpendicular to the car is continuously converted to a laser scan. All the distances

output by each beam are captured during 5000 iterations. From the acquired data it’s made an

assessment of the precision of the sensor and algorithms involved in relation to the distance.

5.1.2.B Results

The tests carried on Scenario A, can be observed on Fig. 5.3. The image demonstrates that an

observed line is only continuous until around 2.5 m. From that point onward, the density of points

decreases to a point that it’s not possible to assess if the observed points belong to a line or are simply

noise. Additionally, on the same figure, the parking space lines demonstrate that the lines are linear only

until approximately 2.5 m. For that reason, the ground segmented point cloud is only filtered up to that

point.

Figure 5.3: Visual demonstration of effect of distance in the number of points belonging to a track’s line.

By collecting the output laser scan from Scenario B, it’s possible to measure the precision of the

algorithm by calculating the standard deviation. For an accuracy test, a sensor with higher accuracy

than the sensor under consideration is required. Since one it’s not available, a precision test was carried

56

instead. The results of the test are summarized in Table 5.2.

Table 5.2: Results of the virtual walls node based on distance for 5000 iterations.

Distance [m] Mean [m] StandardDeviation [m]

1 1.007820125 0.006284148

1.5 1.493256291 0.01143513

2 1.99648467 0.015652938

2.5 2.490326246 0.019847568

These results demonstrate that, in the worst case scenario, the error is within ±2 cm approximately

68% of the time.

5.1.3 Obstacle detection

5.1.3.A Setup

For the same reasons described in the assessment of the virtual walls node, Section 5.1.2, only the

precision of the obstacle detection node is assessed. The laser scan output of the node is captured for

5000 iterations with an obstacle ranging from 1 m to 5 m. The distance of each beam is stored to assess

the precision of the sensor and algorithms involved in relation with the distance.

5.1.3.B Results

From the experimental tests, it’s observed that the node is capable of outputting at the same rate

that the segmented obstacles point cloud in input, being it 12 Hz. From the captured data, the results

obtained are summarized on Table 5.3.

Table 5.3: Results of the obstacle detection node based on distance for 5000 iterations.

Distance [m] Mean [m] StandardDeviation [m]

1 1.000007447 0.002642223

2 2.002703115 0.001730372

3 2.997153746 0.001660753

4 4.004420191 0.002361059

5 4.997393216 0.003318544

One interesting fact is that, although the used sensor and the downsampling process is the same

57

for both inputs of the virtual walls and obstacle detection nodes, the precision of the obstacle detection

node is much higher. This can be explain by the fact that the floor is almost parallel to the camera

while the obstacles are perpendicular to the camera. As demonstrated in Fig. 5.3, the density of

points decreases at a high rate with the increase in the distance. However, this doesn’t happen with

perpendicular obstacles, which can explain the differences in the precision.

Another curious fact is that the optimal distance to detect an obstacle seems to be between 2 m and

3 m, since those present the lowest standard deviation values, contrary to what would be expected, i.e.,

it would be expected that the error would increase with the distance.

For the worst case scenario, the error is within ±0.003318544 m 68% of the time and ±0.009955632

m, very close to 1 cm, 99.7 % of the time. Even though the angle of the Kinect One is fairly limited, being

70 º, since it allows the detection of obstacles up-to 5 m with only ± 1 cm of error, it can replace a real

laser scan to detect the obstacles the competition.

5.1.4 Semaphore Detection

5.1.4.A Setup

For the assessment of the ratio between successful and unsuccessful detections of semaphores, the

car is placed on the optimal position and on a hypothetical worst case position. The optimal position,

Scenario A, is the starting position, where the front of the car is directly below the screen and the camera

is directly pointing to the screen. The hypothetical worst case position, Scenario B, corresponds to a

translation of 12.5 cm in the position and with a clockwise rotation of the vehicle of 25 º, Those values

correspond to an error 5 times bigger than the specified on Table 1.1. The semaphore detection service

is called 500 times for each distinct semaphore to assess the ratio between the correct and incorrect

detections.

5.1.4.B Results

To assess the different causes of failure, no detections (service timeout) are distinguished from in-

correct detections of a given semaphore sign, i.e., when a semaphore different than the one displayed

on the screen is marked as detected. The success ratio for each scenarios and each semaphore along

with the number of correct, incorrect and timed-out calls are summarized on Tables 5.4.

The results of this test demonstrate that the template matching technique applied is able to detect

arrows with higher success rate than the ’P’ shape of the parking semaphore. This is probably due to

the complexity of the shape when compared to the arrows.

As expected from the template matching techniques, the success rate is heavily affected when the

observed shape is resized and rotated. In fact, with the optimal scenario, the results are satisfactory

58

Table 5.4: Results of the semaphore detection node for the difference scenarios.

SemaphoreSuccess ratio [%] Correct calls Incorrect calls Timeout calls

A B A B A B A B

Forward 96.2 82.0 481 410 2 49 17 41

Left 97.8 87.4 489 437 0 23 11 34

Right 97.4 93.8 487 469 0 0 13 31

Park 91.6 77.4 458 387 0 0 42 113

as the robot detects the semaphore correctly, at least, 9 out of 10 times. However, on the worst case

scenario, the detected shapes appears rotated, causing the number of incorrect calls to also increase.

Taking the Forward semaphore shape as example, since the rotation of the vehicle is performed clock-

wise on the present test, the Forward arrows is also rotated clockwise, making it more similar to the

Right semaphore. The same happens with the Left semaphore, which explains the number of incorrect

calls to both semaphores. The Right semaphore, however, even though it was rotated, it’s still fairly

similar to the original shape. If a counter-clockwise test would be performed, it would be expected that

the success rate of the Right semaphore would decrease and the Left semaphore would increase for

the reasons described.

5.1.5 Traffic Sign Recognition

5.1.5.A Setup

Since the traffic sign recognition needs to be performed during a challenge, those have to be de-

tected while the robot is moving. However, during the detection challenge, the velocity component is not

assessed, only the success rate of the traffic sign recognition. For that reason, to assess the success

rate of the traffic sign recognition node, in collaboration with Francisco’s navigation and decision-making

modules from [6], the car is programmed to complete 100 laps at 1 m/s having all traffic signs randomly

spread across the track. The test is repeated 5 times to change the position of the semaphores.

5.1.5.B Results

With the implementation of the two layers to detect the traffic sign shape and, then, the recognition

of the interior shapes of the traffic sign, YOLO’s machine learning algorithm is always able to recognize

all traffic sign with 100 % success rate on the test environment.

Although the data set of the competition is small when compared to the number of real traffic signs,

it’s demonstrated the capability of the machine learning algorithm. Comparing it to the image process-

59

ing technique (from the semaphore detection node), the amount of effort that takes to train the neural

network is relatively small when compared to the effort of implementing an image processing algorithm

that is able to detect the same object. Also, by training the neural network with rotated traffic signs, it’s

also possible to detect rotated objects. For this to occur with the image processing algorithm, one would

have to apply the template matching technique to the rotated objects too, rendering it a less scalable

approach.

5.1.6 Global Map

5.1.6.A Setup

To grade the quality of the map generated by the global map node, the map is visually assessed.

A qualitative analysis is carried instead of a quantitative one as there’s no available sensor capable of

measuring the exact location of the track’s lines, denying the possibility of an accuracy assessment.

5.1.6.B Results

After generating 10 maps with GMapping and visually assessing those, the map that is more similar

to the real track is the one presented in Fig. 5.4.

Figure 5.4: Global map of the track captured by GMapping on the test environment.

In this figure, it is possible to observe that the shape of the real track is very similar to the captured

map and that it even captures the irregularities of the track as, for instance, the interior part of the curve

60

on the right part of the image. The overall quality of the map is satisfactory and it highly increases

the scalability of the project, as it can be applied to virtually any track with continuous lines. The small

irregularities of the generated map are: artifacts near the lines where it seems that a loop closure

occurred and readjusted the observed lines, however, although those lines overlap, those are not totally

coincident; the parallel park seems to be a bit curved although it’s linear on the test environment; the

zebra crossing displays a high amount of noise, even though it captured the essence of the zebra shape.

5.1.7 Global Localization

5.1.7.A Setup

Since the test environment is not suited with an external system capable of providing a ground truth

for the robot, i.e., a more accurate position than the one provided by the robot, to make the assessment

of the accuracy of the global localization node, the robot is commanded to complete 50 full laps where

the starting and ending positions are the same. With this, the end pose provides the absolute error of

the global localization at the end of a challenge. This test is very relevant since, by the rules defined

for the competition, the robot needs to start and stop on the zebra line, which has a 5 cm thickness,

which is also according to our position accuracy specification of ± 2.5 cm. The same test is performed

at different velocities to evaluate the effects of it on the global localization.

5.1.7.B Results

From the test carried, it’s determined that the final localization error increases exponentially with

the increase of velocity. These results are demonstrated on Table 5.5, where the standard deviation is

related with the velocity of the vehicle.

Table 5.5: Results of the obstacle detection node based on distance for 5000 iterations.

Velocity [m/s]PositionStandardDeviation [m]

OrientationStandardDeviation [rad]

0.5 0.02230471 0.010995574

1.0 0.02636126 0.024260076

1.5 0.04606525 0.036600291

2.0 0.09157002 0.052780230

Given that the car travels 36 m for each full lap, knowing that the car’s position error is approximately

± 10 cm for 68 % of the time on the worst case scenario, it is demonstrated that the global localization

61

node is able to correct the majority of the odometry drift and provide an accurate absolute position

estimate to the navigation and decision-making modules.

Even though the specified localization accuracy of ± 2.5 cm is only met for a continuous velocity of

0.5 m/s, the orientation error is kept within ± 3 º for 68% of the time on the worst case scenario, which

is within the established specifications

To compare the accuracy of the global localization with the relative pose estimation node, the path

output of both is depicted in Fig. 5.5 from one of the tests at 1 m/s.

Figure 5.5: Visual demonstration of path provided by the global localization, represented with the green color, andrelative pose estimation, represented with the red color.

In the figure it’s noticeable the lack of accuracy of the relative pose estimation node and the drift

accumulated over the course of two full laps, justifying the inclusion of the global localization node to

correct the overall drift and increasing the accuracy to the specified values.

5.2 Performance of the system on the competition environment

To assess the performance of the whole system, it’s put to test on the Robotics Open 2018’s Au-

tonomous Driving challenge. The objective of the competition is to demonstrate a proof-of-concept

where the system is evaluated on a real scenario and by a third-party entity. Based on the performance

of the system, the modules are iterated and improved having into consideration the observed behavior

and penalties.

62

5.2.1 Robotics Open Environment

On April 2018, in the Portuguese Robotics Open, the designed system was put to its initial test. The

majority of the challenges were successfully completed and the designed system was able to achieve

the 2nd place, very close to the 1st position in terms of score, as demonstrated in Appendix B.

This proof-of-concept validated the success of the current approach. Contrary to the other teams’

approaches, where the robot typically follows a line until a specific condition is met, for instance, the

dashed line is no longer visible or the zebra crossing is present, the present system is fully aware of its

absolute localization within the track along with the absolute position of the obstacles within the track,

allowing it to plan the best trajectory ahead of time instead of reacting to the presence of the mentioned

specific conditions of obstacles.

The system is designed in a such a way that, even if the environment changes, the robot is able

to generate a new map of the environment, provided that it contains white lines, and a new optimal

trajectory can be given to the robot. This makes the autonomous vehicle system more scalable than a

simple line-follower approach.

During the competition, the car’s target speed was set to 1 m/s as the system was still under de-

velopment. The nodes were not fully optimized and, under some circumstances, the robot’s absolute

position would perform jumps of approximately half of a meter. This heavily affected the navigation and

decision-making modules and, for that reason, the target speed was set to a lower value than speci-

fied. On the first iteration of the system, it did not include the traffic signs recognition node, thus, the

mentioned challenge was not completed.

Additionally, respecting the modules of the present thesis, the Kinect One was heavily affected by

the environment’s infra-red emissions, where a number of random points would be visible in the Kinect’s

point cloud, even though those were not visible to the naked eye. This not only affected the performance

of the Kinect, as it invalidated the use of the sensor as an obstacle detector, since the provided data was

unreliable. In order to mitigate this problem, a real laser scan had to be added to the robot during the

competition. On the testing environment, the aforementioned problem is also visible, however, at a very

small fraction of what happened in the FNR environment.

To conclude, as also demonstrated on the nodes’ evaluation sections and in Francisco’s thesis [6],

the final iteration of the robot is able to successfully execute all the challenges within the defined spec-

ifications with the exception of the target speed. Even though it’s able to navigate at that speed, given

the length of the track, the localization error and control instability, the car-like robot commits a large

number of penalties. Since these heavily affect the scores, lower speeds are targeted. Nevertheless,

the speed of 1 m/s almost doubles all of the other team’s speeds. This demonstrates that even though

the specified goals are ambitious the objectives have been met.

63

6Conclusions and Future Work

Contents

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

64

Conclusions and Future Work

6.1 Conclusions

This document presents a proposal of a system with the objective of competing in the Autonomous

Driving challenge of the Portuguese Robotics Open. It focuses on the necessary hardware and elec-

tronics that serve as basis to the designed autonomous driving robotics platform and on the necessary

modules to provide an absolute localization within a known map and to perceive the surrounding objects

of the environment, being those obstacles, semaphores, traffic signs, or construction cones.

To address this problem, the state of the art sensors and methods used in the literature are reviewed

and implemented. The proposed system takes advantage of the state of the art methods to assess its

absolute position within a map and to plan the optimal trajectory, presenting an uncommon and unique

solution in the event’s history.

This thesis outlines all the necessary steps to make the autonomous driving vehicle a reality. It starts

on the design of the basic electronics components and circuitry to power and support the low-levels

sensors and actuators, goes through the creation of an hardware interface that bridges between the

hardware and the Robotics Operating System, and finalizes with the implementation and evaluation of

the proposed software nodes.

To address the problem of lack of features of the environment to apply a global localization algorithm,

a novel technique to generate virtual walls based on the track’s lines is developed, enabling Occupancy-

Grid based localization algorithms like MCL. With these virtual walls, it’s possible to generate virtually

any map that is filled with continuous lines and localize the robot in it. This approach provides a scal-

able solution should the map of the environment change in the subsequent years. Additionally, using

the same sensor, the Kinect One, it’s also presented an approach to detect the navigable path and

the obstacles standing in it and two distinct methods to recognize the different signs present on the

environment, namely, the semaphores and the traffic signs.

The developed system is able to localize the robot, for a constant velocity of 1 m/s with an expected

error of ± 0.0263 m in the position and ± 1.375 º in the orientation. The robot is able to meet the

specified ± 0.025 m at lower velocities, however, during the challenges, the robot is able to finalize

65

within the expected stopping area of 0.05 m most of the times. Given that the speed is a big factor on

the score and given that the expected error at 1 m/s is very close to the specified limit, this is set as the

target velocity. Even though the robot is able to reach the defined target speed of 2 m/s, this ambitious

velocity is almost impracticable due to the size of the track, the localization error at this velocity, and the

instability of the control at this pace.

Regarding the detection of obstacles, those are detected within a range of 0.5 m to 5 m with an

expected error of approximately 1 cm for 99.73 % of the time, which is five times less than specified, and

at a rate of 12 Hz, which is more than double of the specified rate.

The pragmatic solution presented for the detection of the semaphores is capable of fulfilling the

objectives of the competition and is demonstrated to be a viable solution when the data set is small and

when the control of the car is optimal, i.e., if the car is able to always directly point to the semaphores.

The neural network approach for the traffic signs is capable of achieving a 100 % success rate and

could also be used for the semaphores to increase the success rate when the semaphores are rotated,

i.e., when the car is not directly pointing to the semaphores.

The first iteration of the designed system was showcased at Robotics Open competition, held in April

2018, achieving the 2nd place, very close in terms of score to the 1st position, confirming the success of

the concept and its advantages. The final iteration of the autonomous driving system, presents a more

stable solution which is demonstrated to be able to address all of the competition challenges.

6.2 Future work

Given that an academic robotics platform for autonomous driving is built, it can serve as a basis for

a forthcoming work. Below, it’s presented a list of the possible improvements on the presented work:

• Implementation of the ground segmentation and virtual walls nodes over GPU. Although the cur-

rent implementation is able to achieve a rate of 12 Hz, it’s only achieved by downsampling the

Kinect’s point cloud. It’s possible that the accuracy of the algorithm increases if the voxel size of

the downsample process is reduced while maintaining a high output rate.

• Switching the implemented linear power regulators for switching regulators to decrease the overall

energy dissipation of the circuit and, consequently, the temperature inside the robot frame.

• Validating if the inclusion of a wheel odometry system to each wheel, which was not possible due

to the suspension of the used vehicle, increases the accuracy of the relative pose estimation.

• Evaluate the developed system on a different environment to assess the scalability of the devel-

oped solution.

66

Bibliography

[1] (2017) Autonomous driving competition overview. [Online]. Available: robotica2017.isr.uc.pt/

index.php/en/competitions/major/autonomous-driving

[2] [Online]. Available: https://ec.europa.eu/commission/commissioners/2014-2019/moedas/

announcements/european-conference-connected-and-automated-driving-together-shaping-future

en

[3] [Online]. Available: https://ec.europa.eu/transport/road safety/specialist/statistics en

[4] [Online]. Available: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812115

[5] “SAE J3016: Taxonomy and Definitions for Terms Related to On-Road Motor Vehicle Automated

Driving Systems,” Society of Automotive Engineers, Standard, Sep. 2016.

[6] F. Varandas, “Autonomous driving of competition robot - navigation and decision making,” 2018.

[7] S. Thrun, W. Burgard, and D. Fox, Probabilistic robotics. MIT press, 2005.

[8] S. Thrun, J.-S. Gutmann, D. Fox, W. Burgard, B. Kuipers et al., “Integrating topological and metric

maps for mobile robot navigation: A statistical approach,” in AAAI/IAAI, 1998, pp. 989–995.

[9] A. Saffiotti, “Handling uncertainty in control of autonomous robots,” Applications of Uncertainty

Formalisms, pp. 198–224, 1998.

[10] H. W. Sorenson and A. R. STUBBERUD §, “Non-linear filtering by approximation of the a posteriori

density,” International Journal of Control, vol. 8, no. 1, pp. 33–51, 1968.

[11] R. E. Kalman et al., “A new approach to linear filtering and prediction problems,” Journal of basic

Engineering, vol. 82, no. 1, pp. 35–45, 1960.

[12] R. Negenborn, “Robot localization and kalman filters,” Utrecht Univ., Utrecht, Netherlands, Mas-

ter’s thesis INF/SCR-0309, 2003.

67

robotica2017.isr.uc.pt/index.php/en/competitions/major/autonomous-driving

robotica2017.isr.uc.pt/index.php/en/competitions/major/autonomous-driving

https://ec.europa.eu/commission/commissioners/2014-2019/moedas/announcements/european-conference-connected-and-automated-driving-together-shaping-future_en



https://ec.europa.eu/transport/road_safety/specialist/statistics_en

https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812115

[13] G. Bishop and G. Welch, “An introduction to the kalman filter,” Proc of SIGGRAPH, Course, vol. 8,

no. 27599-23175, p. 41, 2001.

[14] P. S. Maybeck, Stochastic models, estimation, and control. Academic press, 1982, vol. 3.

[15] M. Grewal and A. Andrew, “Kalman filtering,” in International Encyclopedia of Statistical Science.

Springer, 2001.

[16] E. A. Wan and R. Van Der Merwe, “The unscented kalman filter for nonlinear estimation,” in Adap-

tive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC.

The IEEE 2000. Ieee, 2000, pp. 153–158.

[17] S. J. Julier and J. K. Uhlmann, “A new extension of the kalman filter to nonlinear systems,” in Int.

symp. aerospace/defense sensing, simul. and controls, vol. 3, no. 26. Orlando, FL, 1997, pp.

182–193.

[18] S. Se, D. Lowe, and J. Little, “Local and global localization for mobile robots using visual land-

marks,” in Intelligent Robots and Systems, 2001. Proceedings. 2001 IEEE/RSJ International Con-

ference on, vol. 1. IEEE, 2001, pp. 414–420.

[19] D. Fox, W. Burgard, and S. Thrun, “Markov localization for mobile robots in dynamic environments,”

Journal of Artificial Intelligence Research, vol. 11, pp. 391–427, 1999.

[20] D. Fox, W. Burgard, F. Dellaert, and S. Thrun, “Monte carlo localization: Efficient position estima-

tion for mobile robots,” AAAI/IAAI, vol. 1999, no. 343-349, pp. 2–2, 1999.

[21] K. Kanazawa, D. Koller, and S. Russell, “Stochastic simulation algorithms for dynamic probabilis-

tic networks,” in Proceedings of the Eleventh conference on Uncertainty in artificial intelligence.

Morgan Kaufmann Publishers Inc., 1995, pp. 346–351.

[22] F. Dellaert, D. Fox, W. Burgard, and S. Thrun, “Monte carlo localization for mobile robots,” in

Robotics and Automation, 1999. Proceedings. 1999 IEEE International Conference on, vol. 2.

IEEE, 1999, pp. 1322–1328.

[23] F. Dellaert, W. Burgard, D. Fox, and S. Thrun, “Using the condensation algorithm for robust, vision-

based mobile robot localization,” in Computer Vision and Pattern Recognition, 1999. IEEE Com-

puter Society Conference on., vol. 2. IEEE, 1999, pp. 588–594.

[24] S. Thrun, D. Fox, W. Burgard, and F. Dellaert, “Robust monte carlo localization for mobile robots,”

Artificial intelligence, vol. 128, no. 1-2, pp. 99–141, 2001.

68

[25] L. Zhang, R. Zapata, and P. Lepinay, “Self-adaptive monte carlo localization for mobile robots using

range sensors,” in Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International

Conference on. IEEE, 2009, pp. 1541–1546.

[26] J. Borenstein and L. Feng, “Measurement and correction of systematic odometry errors in mobile

robots,” IEEE Transactions on robotics and automation, vol. 12, no. 6, pp. 869–880, 1996.

[27] C. M. Wang, “Location estimation and uncertainty analysis for mobile robots,” in Robotics and

Automation, 1988. Proceedings., 1988 IEEE International Conference on. IEEE, 1988, pp. 1231–

1235.

[28] N. L. Doh and W. K. Chung, “A systematic representation method of the odometry uncertainty of

mobile robots,” Intelligent Automation & Soft Computing, vol. 12, no. 4, pp. 397–409, 2006.

[29] G. Klein and D. Murray, “Parallel tracking and mapping for small ar workspaces,” in Mixed and

Augmented Reality, 2007. ISMAR 2007. 6th IEEE and ACM International Symposium on. IEEE,

2007, pp. 225–234.

[30] D. Nister, O. Naroditsky, and J. Bergen, “Visual odometry for ground vehicle applications,” Journal

of Field Robotics, vol. 23, no. 1, pp. 3–20, 2006.

[31] F. Steinbrucker, J. Sturm, and D. Cremers, “Real-time visual odometry from dense rgb-d im-

ages,” in Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference

on. IEEE, 2011, pp. 719–722.

[32] A. Surrecio, U. Nunes, and R. Araujo, “Fusion of odometry with magnetic sensors using kalman

filters and augmented system models for mobile robot navigation,” in Proceedings of the IEEE

International Conference on Industrial Electronics, Dubrovnik, Croacia, 2005.

[33] V. A. Ajay, A. P. Suherlan, G. S. Soh, S. Foong, K. Wood, and K. Otto, “Localication and trajectory

tracking of an autonomous spherical rolling robot using imu and odometry,” in Proceedings of

the ASME 2015 International Design Engineering Technical Conferences and Computers and

Information in Engineering Conference, Boston, MA, USA, 2015, pp. 2–5.

[34] H. Chung, L. Ojeda, and J. Borenstein, “Accurate mobile robot dead-reckoning with a precision-

calibrated fiber-optic gyroscope,” IEEE Transactions on Robotics and Automation, vol. 17, no. 1,

pp. 80–84, 2001.

[35] M. Marron, J. C. Garcia, M. Sotelo, E. Lopez, and M. Mazo, “Fusing odometric and vision data with

an ekf to estimate the absolute position of an autonomous mobile robot,” in Emerging Technologies

and Factory Automation, 2003. Proceedings. ETFA’03. IEEE Conference, vol. 1. IEEE, 2003, pp.

591–596.

69

[36] S. B. Goldberg and L. Matthies, “Stereo and imu assisted visual odometry on an omap3530 for

small robots,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Com-

puter Society Conference on. IEEE, 2011, pp. 169–176.

[37] K. Berntorp, K.-E. Arzen, and A. Robertsson, “Sensor fusion for motion estimation of mobile robots

with compensation for out-of-sequence measurements,” in Control, Automation and Systems (IC-

CAS), 2011 11th International Conference on. IEEE, 2011, pp. 211–216.

[38] S. Thrun et al., “Robotic mapping: A survey,” Exploring artificial intelligence in the new millennium,

vol. 1, pp. 1–35, 2002.

[39] M. J. Mataric, “A distributed model for mobile robot environment-learning and navigation,” 1990.

[40] B. Kuipers and Y.-T. Byun, “A robot exploration and mapping strategy based on a semantic hi-

erarchy of spatial representations,” Robotics and autonomous systems, vol. 8, no. 1, pp. 47–63,

1991.

[41] J. Andrade-Cetto and A. Sanfeliu, “Topological map learning for a mobile robot in indoor envi-

ronments,” in Proceedings of the 9th Spanish Symposium on Pattern Recognition and Image

Analysis, vol. 1, 2001, pp. 221–226.

[42] S. Panzieri, F. Pascucci, R. Setola, and G. Ulivi, “A low cost vision based localization system for

mobile robots,” target, vol. 4, p. 5, 2001.

[43] T. Bailey, “Mobile robot localisation and mapping in extensive outdoor environments,” Ph.D. dis-

sertation, Univ. Sydney, Australian Ctr. Field Robotics, 2002.

[44] O. Pink, “Visual map matching and localization using a global feature map,” in Computer Vision

and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE Computer Society Conference on.

IEEE, 2008, pp. 1–7.

[45] H. Kim, D. Lee, T. Oh, H.-T. Choi, and H. Myung, “A probabilistic feature map-based localization

system using a monocular camera,” Sensors, vol. 15, no. 9, pp. 21 636–21 659, 2015.

[46] A. Elfes, “Sonar-based real-world mapping and navigation,” IEEE Journal on Robotics and Au-

tomation, vol. 3, no. 3, pp. 249–265, 1987.

[47] J. Borenstein and Y. Koren, “The vector field histogram-fast obstacle avoidance for mobile robots,”

IEEE transactions on robotics and automation, vol. 7, no. 3, pp. 278–288, 1991.

[48] A. Arleo, J. d. R. Millan, and D. Floreano, “Efficient learning of variable-resolution cognitive maps

for autonomous indoor navigation,” IEEE Transactions on Robotics and Automation, vol. 15, no. 6,

pp. 990–1000, 1999.

70

[49] G. K. Kraetzschmar, G. P. Gassull, K. Uhl, G. Pags, and G. K. Uhl, “Probabilistic quadtrees for

variable-resolution mapping of large environments,” in Proceedings of the 5th IFAC/EURON sym-

posium on intelligent autonomous vehicles. July, 2004.

[50] K. M. Wurm, A. Hornung, M. Bennewitz, C. Stachniss, and W. Burgard, “Octomap: A probabilis-

tic, flexible, and compact 3d map representation for robotic systems,” in Proc. of the ICRA 2010

workshop on best practice in 3D perception and modeling for mobile manipulation, vol. 2, 2010.

[51] G. Weiß and E. v. Puttkamer, “A map based on laserscans without geometric interpretation,” in

Intelligent Autonomous Systems, vol. 4, no. 2. IOS Press, 1995, pp. 403–407.

[52] F. Lu and E. E. Milios, “Optimal global pose estimation for consistent sensor data registration,”

in Robotics and Automation, 1995. Proceedings., 1995 IEEE International Conference on, vol. 1.

IEEE, 1995, pp. 93–100.

[53] J. L. Crowley, F. Wallner, and B. Schiele, “Position estimation using principal components of range

data,” Robotics and Autonomous Systems, vol. 23, no. 4, pp. 267–276, 1998.

[54] R. Sim and G. Dudek, “Learning visual landmarks for pose estimation,” in Robotics and Automa-

tion, 1999. Proceedings. 1999 IEEE International Conference on, vol. 3. IEEE, 1999, pp. 1972–

1978.

[55] I. Ulrich and I. Nourbakhsh, “Appearance-based place recognition for topological localization,” in

Robotics and Automation, 2000. Proceedings. ICRA’00. IEEE International Conference on, vol. 2.

Ieee, 2000, pp. 1023–1029.

[56] S. Segvic, A. Remazeilles, A. Diosi, and F. Chaumette, “A mapping and localization framework

for scalable appearance-based navigation,” Computer Vision and Image Understanding, vol. 113,

no. 2, pp. 172–187, 2009.

[57] D. F. Wolf and G. S. Sukhatme, “Towards mapping dynamic environments,” in In Proceedings of

the International Conference on Advanced Robotics (ICAR), 2003, pp. 594–600.

[58] D. Wolf and G. S. Sukhatme, “Online simultaneous localization and mapping in dynamic envi-

ronments,” in Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International

Conference on, vol. 2. IEEE, 2004, pp. 1301–1307.

[59] C. Hane, T. Sattler, and M. Pollefeys, “Obstacle detection for self-driving cars using only monoc-

ular cameras and wheel odometry,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ

International Conference on. IEEE, 2015, pp. 5101–5108.

71

[60] P. Fleischmann and K. Berns, “A stereo vision based obstacle detection system for agricultural

applications,” in Field and Service Robotics. Springer, 2016, pp. 217–231.

[61] D. Holz, S. Holzer, R. B. Rusu, and S. Behnke, “Real-time plane segmentation using rgb-d cam-

eras,” in Robot Soccer World Cup. Springer, 2011, pp. 306–317.

[62] B. Peasley and S. Birchfield, “Real-time obstacle detection and avoidance in the presence of

specular surfaces using an active 3d sensor,” in Robot Vision (WORV), 2013 IEEE Workshop on.

IEEE, 2013, pp. 197–202.

[63] A. De La Escalera, L. E. Moreno, M. A. Salichs, and J. M. Armingol, “Road traffic sign detection

and classification,” IEEE transactions on industrial electronics, vol. 44, no. 6, pp. 848–859, 1997.

[64] W. Shadeed, D. I. Abu-Al-Nadi, and M. J. Mismar, “Road traffic sign detection in color images,”

in Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003 10th IEEE

International Conference on, vol. 2. IEEE, 2003, pp. 890–893.

[65] N. Yabuki, Y. Matsuda, Y. Fukui, and S. Miki, “Region detection using color similarity,” in Circuits

and Systems, 1999. ISCAS’99. Proceedings of the 1999 IEEE International Symposium on, vol. 4.

IEEE, 1999, pp. 98–101.

[66] C.-Y. Fang, C.-S. Fuh, S.-W. Chen, and P.-S. Yen, “A road sign recognition system based on

dynamic visual model,” in Computer Vision and Pattern Recognition, 2003. Proceedings. 2003

IEEE Computer Society Conference on, vol. 1. IEEE, 2003, pp. I–I.

[67] M.-Y. Fu and Y.-S. Huang, “A survey of traffic sign recognition,” in Wavelet Analysis and Pattern

Recognition (ICWAPR), 2010 International Conference on. IEEE, 2010, pp. 119–124.

[68] X. Gao, K. Hong, P. Passmore, L. Podladchikova, and D. Shaposhnikov, “Colour vision model-

based approach for segmentation of traffic signs,” Journal on Image and Video Processing, vol.

2008, p. 6, 2008.

[69] A. L. Yuille, D. Snow, and M. Nitzberg, “Signfinder: Using color to detect, localize and identify

informational signs,” in Computer Vision, 1998. Sixth International Conference on. IEEE, 1998,

pp. 628–633.

[70] N. Kehtarnavaz, N. C. Griswold, and D. Kang, “Stop-sign recognition based on color/shape pro-

cessing,” Machine Vision and Applications, vol. 6, no. 4, pp. 206–208, 1993.

[71] H. Fleyeh, “Traffic and road sign recognition,” Ph.D. dissertation, 2008.

[72] C. Schiekel, “A fast raffic sign recognition algorithm for gray value images,” in Computer Analysis

of Images and Patterns. Springer, 1999, pp. 836–836.

72

[73] Y. Aoyagi and T. Asakura, “A study on traffic sign recognition in scene image using genetic algo-

rithms and neural networks,” in Industrial Electronics, Control, and Instrumentation, 1996., Pro-

ceedings of the 1996 IEEE IECON 22nd International Conference on, vol. 3. IEEE, 1996, pp.

1838–1843.

[74] G. Piccioli, E. De Micheli, P. Parodi, and M. Campani, “Robust road sign detection and recognition

from image sequences,” in Intelligent Vehicles’ 94 Symposium, Proceedings of the. IEEE, 1994,

pp. 278–283.

[75] N. Barnes and A. Zelinsky, “Real-time radial symmetry for speed sign detection,” in Intelligent

Vehicles Symposium, 2004 IEEE. IEEE, 2004, pp. 566–571.

[76] G. Loy and N. Barnes, “Fast shape-based road sign detection for a driver assistance system,” in

Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International

Conference on, vol. 1. IEEE, 2004, pp. 70–75.

[77] E. Perez and B. Javidi, “Scale and illumination-invariant road sign detection,” in Lasers and Electro-

Optics Society 2000 Annual Meeting. LEOS 2000. 13th Annual Meeting. IEEE, vol. 2. IEEE, 2000,

pp. 748–749.

[78] M. Nakamura, S. Kodama, T. Jimbo, and M. Umeno, “Searching and recognition of road signpost

using ring detection network,” in Systems, Man, and Cybernetics, 1999. IEEE SMC’99 Conference

Proceedings. 1999 IEEE International Conference on, vol. 2. IEEE, 1999, pp. 190–195.

[79] G. Piccioli, E. De Micheli, P. Parodi, and M. Campani, “Robust method for road sign detection and

recognition,” Image and Vision Computing, vol. 14, no. 3, pp. 209–223, 1996.

[80] S. Azami, S. Katahara, and M. Aoki, “Route guidance sign identification using 2-d structural de-

scription,” in Intelligent Vehicles Symposium, 1996., Proceedings of the 1996 IEEE. IEEE, 1996,

pp. 153–158.

[81] Y. B. Lauziere, D. Gingras, and F. P. Ferrie, “A model-based road sign identification system,” in

Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE

Computer Society Conference on, vol. 1. IEEE, 2001, pp. I–I.

[82] H. Ohara, I. Nishikawa, S. Miki, and N. Yabuki, “Detection and recognition of road signs using

simple layered neural networks,” in Neural Information Processing, 2002. ICONIP’02. Proceedings

of the 9th International Conference on, vol. 2. IEEE, 2002, pp. 626–630.

[83] J. Miura, T. Kanda, S. Nakatani, and Y. Shirai, “An active vision system for on-line traffic sign

recognition,” IEICE TRANSACTIONS on Information and Systems, vol. 85, no. 11, pp. 1784–1792,

2002.

73

[84] C.-Y. Fang, C.-S. Fuh, P. Yen, S. Cherng, and S.-W. Chen, “An automatic road sign recognition

system based on a computational model of human recognition processing,” Computer vision and

Image understanding, vol. 96, no. 2, pp. 237–268, 2004.

[85] A. Shustanov and P. Yakimov, “Cnn design for real-time traffic sign recognition,” Procedia Engi-

neering, vol. 201, pp. 718–725, 2017.

[86] S. Lafuente-Arroyo, P. Gil-Jimenez, R. Maldonado-Bascon, F. Lopez-Ferreras, and S. Maldonado-

Bascon, “Traffic sign shape classification evaluation i: Svm using distance to borders,” in Intelligent

Vehicles Symposium, 2005. Proceedings. IEEE. IEEE, 2005, pp. 557–562.

[87] M. Shi, H. Wu, and H. Fleyeh, “Support vector machines for traffic signs recognition,” in Neural

Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE Inter-

national Joint Conference on. IEEE, 2008, pp. 3820–3827.

[88] P. Gil-Jimenez, H. Gomez-Moreno, P. Siegmann, S. Lafuente-Arroyo, and S. Maldonado-Bascon,

“Traffic sign shape classification based on support vector machines and the fft of the signature of

blobs,” in Intelligent Vehicles Symposium, 2007 IEEE. IEEE, 2007, pp. 375–380.

[89] Y.-Y. Nguwi and A. Z. Kouzani, “Detection and classification of road signs in natural environments,”

Neural computing and applications, vol. 17, no. 3, pp. 265–289, 2008.

[90] J. Greenhalgh and M. Mirmehdi, “Real-time detection and recognition of road traffic signs,” IEEE

Transactions on Intelligent Transportation Systems, vol. 13, no. 4, pp. 1498–1506, 2012.

[91] P. Paclık, J. Novovicova, and R. P. Duin, “Building road-sign classifiers using a trainable similarity

measure,” IEEE Transactions on Intelligent Transportation Systems, vol. 7, no. 3, pp. 309–321,

2006.

[92] Robotics operating system history. [Online]. Available: http://www.ros.org/history/

[93] [Online]. Available: http://wiki.ros.org/tf2

[94] [Online]. Available: http://wiki.ros.org/rosserial arduino

[95] [Online]. Available: https://github.com/varandaas/xuning

[96] T. Wiedemeyer, “IAI Kinect2,” https://github.com/code-iai/iai kinect2, Institute for Artificial Intelli-

gence, University Bremen, 2014 – 2015, accessed June 12, 2015.

[97] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.

[98] R. B. Rusu and S. Cousins, “3D is here: Point Cloud Library (PCL),” in IEEE International Confer-

ence on Robotics and Automation (ICRA), Shanghai, China, May 9-13 2011.

74

http://www.ros.org/history/

http://wiki.ros.org/tf2

http://wiki.ros.org/rosserial_arduino

https://github.com/varandaas/xuning

https://github.com/code-iai/iai_kinect2

[99] [Online]. Available: http://wiki.ros.org/pointcloud to laserscan

[100] T. Moore and D. Stouch, “A generalized extended kalman filter implementation for the robot op-

erating system,” in Proceedings of the 13th International Conference on Intelligent Autonomous

Systems (IAS-13). Springer, July 2014.

[101] G. Grisetti, C. Stachniss, and W. Burgard, “Improved techniques for grid mapping with rao-

blackwellized particle filters,” IEEE transactions on Robotics, vol. 23, no. 1, pp. 34–46, 2007.

[102] T. Coroiu and O. Hinton, “A platform for indoor localisation, mapping, and data collection using an

autonomous vehicle,” Master’s Theses in Mathematical Sciences, 2017.

[103] [Online]. Available: http://wiki.ros.org/amcl

[104] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object

detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition,

2016, pp. 779–788.

[105] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv:1804.02767, 2018.

75

http://wiki.ros.org/pointcloud_to_laserscan

http://wiki.ros.org/amcl

APrinted Circuit Board

76

Figu

reA

.1:

Circ

uitb

oard

77

Figu

reA

.2:

Circ

uits

chem

atic

s

78

BCompetition results

79

scores

Page 1

TEAM SCORED1 D2 D3 D4 P1 P2 B1 B2 V1

T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

ROTA-2018 UA/IEETA 779 146 209 266 245 28 28 60 60 60 60

Conde FEUP 963 187 195 206 22 37 60 0 60 60 60 150 285 115

N3E Xuning ISR 942 141 231 40 0 35 255 0 91 228 0 0 108 50 60 0 50 60

VAI Sozinho IPLeiria 508 0 108 272 176 62 0 60

D1

Page 2

D1 refROTA-2018 Conde N3E Xuning VAI Sozinho

T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

half laps 50 4 3 4 4 4 3 4 1 0 2 4

time 30 174 31 54 133 115 59 79 20 22 48

Early departure 15

Going outside of the track (partial) 10 2 1 1

Going outside of the track (total) 20 1 1 1 1

Stopping outside the stopping area 10 1

Small collision with track accessories 10

Collision with track accessories altering geometry 20

SCORE 146 209 266 187 195 141 231 40 0 108 272

M1 M2 M2 M1 M3 M1 M2 M4 M1 M2 M3

D2

Page 3


T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

half laps 50 4 4 0 1 4 4 2 0

time 35 70 134 0 30 75 54 38 0

Early departure 15

Going outside of the track (partial) 10

Going outside of the track (total) 20 1 1 1


Wrong direction at the signaling panels 25 1 2 2

Non stopping at the stop sign 60 1

Early departure at the stop sign 25



SCORE 245 206 0 35 255 176 62 0

M4 M2 M2 M2 M4 M4 M4 M4

D3

Page 4


T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

half laps 50 1 1 1 1 1 2 4

time 40 17 17 33 33 28 44 112

Early departure 15 1

Going outside of the track (partial) 10 1 1

Going outside of the track (total) 20 1 1 1 1 1


Wrong direction at the signaling panels 25 1 1 2 1

Non stopping at the stop sign 60


Small collision with track accessories 10 1

Collision with track accessories altering geometry 20 1

SCORE 28 28 22 37 0 91 228

M4 M4 M4 M4 M3 M3 M3

D4

Page 5


T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

half laps 50 0 0 2

time 45 0 0 52

Early departure 15

Going outside of the track (partial) 10 2

Going outside of the track (total) 20


Wrong direction at the signaling panels 25

Non stopping at the stop sign 60




SCORE 0 0 108

M4 M4 M4

V1

Page 6

V1 refROTA-2018 Conde N3E Xuning VAI Sozinho

T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

sign type 25 3 6 3

sign number 25 3 6 3

Early departure 15 1 1


Going outside of the track (total) 20 1

Stopping outside the stopping area 10



SCORE 150 285 115

M3 M4 M4

P1

Page 7

P1 refROTA-2018 Conde N3E Xuning VAI Sozinho

T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

parking manoeuvre 60 1 1 1 1 1

Early parking manoeuvre 10 1



Irregular parking manoeuvre (out of bounds) 10

Irregular parking manoeuvre (out of alignment) 10



SCORE 60 60 50 60 60

M2 M1 M1 M1 M2

P2

Page 8

P2 refROTA-2018 Conde N3E Xuning VAI Sozinho

T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

parking manoeuvre 60 1 0 1

Early parking manoeuvre 10

Going outside of the track (partial) 10 1



Irregular parking manoeuvre (out of alignment) 10



SCORE 60 0 60

M2 M2 M3

B1

Page 9

B1 refROTA-2018 Conde N3E Xuning VAI Sozinho

T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

parking manoeuvre 60 1 1 0

Early parking manoeuvre 10




SCORE 60 60 0

M3 M2 M2

B2

Page 10

B2 refROTA-2018 Conde N3E Xuning VAI Sozinho

T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3

parking manoeuvre 60 1 1 1 1

Early parking manoeuvre 10 1




SCORE 60 60 50 60

M4 M2 M3 M4

autonomous driving of competition robot · o seu hardware e software. dada a sua complexidade, a...

Documents