autonomous driving of competition robot · o seu hardware e software. dada a sua complexidade, a...
TRANSCRIPT
Autonomous Driving of Competition Robot
Localization and Environment Perception
Ruben Daniel Nunes Capitao
Thesis to obtain the Master of Science Degree in
Electronics Engineering
Supervisor: Pedro Manuel Urbano de Almeida Lima
Examination Committee
Chairperson: Pedro Miguel Pinto RamosSupervisor: Pedro Manuel Urbano de Almeida LimaMember of the Committee: Carlos Baptista Cardeira
October 2018
Acknowledgments
Firstly, a huge thanks to Francisco for sharing this thesis, for all the time, effort, and dedication to this
project, without him this thesis would not be possible to achieve. To my supervisor, Pedro Lima, for his
guidance and advice whenever I needed the most. I’d also like to share my words of appreciation to all
the people involved in N3E and N3E Robotics, for always sharing their ideas, knowledge and a much
appreciated help when in comes to competitions’s bureaucracies. Also, to my parents and girlfriend for
consistently providing all the necessary means and support for me to accomplish all I ever wanted.
To each and every one of you I’m expressing these words of gratitude!
Abstract
With the objective of competing in the Autonomous Driving challenge of the Portuguese Robotics Open
2018, an autonomous mobile robot based on a remote-controlled car was entirely designed, imple-
mented and tested, including its hardware and software. Given its broad scope, the full extent of the
competition challenges is covered across two parallel theses. The present thesis focuses, on the hard-
ware part, on the electronics of the robot and the localization and perception of the environment software
modules. It proposes a different approach from traditional lane-following techniques. In the thesis ap-
proach, the robot is fully aware of its surroundings and about its absolute localization. Although the track
is planar, i.e., without any surrounding walls, a novel algorithm is developed to generate virtual walls over
the lines of the track enabling occupancy-grid based localization using Monte Carlo Localization. The
same algorithm is capable of identifying the track and segment it from the present obstacles standing
on it, allowing their detection. To conclude, the recognition of semaphores and traffic signs based on
state-of-the-art approaches, namely, image processing and machine learning, are also developed.
Keywords
Autonomous Driving; Localization; Perception of the Environment; Robotics Operating System.
iii
Resumo
Com o objetivo de competir no desafio de Conducao Autonoma do Festival Nacional de Robotica 2018,
foi criado, implementado e testado um carro autonomo baseado num carro tele-comandado, incluindo
o seu hardware e software. Dada a sua complexidade, a totalidade dos desafios da competicao e apre-
sentada em duas teses paralelas. No que respeita ao hardware, a presente tese foca-se na eletronica
do robo e nos modulos de software de localizacao e percecao do ambiente. Esta tese propoe uma
abordagem diferente as tecnicas tradicionais de seguir linhas. O robo esta totalmente informado sobre
o que o rodeia e em relacao a sua localizacao absoluta. Embora a pista seja plana, i.e., sem paredes
a volta, um novo algoritmo e desenvolvido para gerar paredes virtuais em torno das linhas da pista,
permitindo a utilizacao de matrizes de ocupacao recorrendo a Localizacao Monte Carlo. O mesmo al-
goritmo e capaz de identificar a pista e segmenta-la dos obstaculos existentes na mesma, permitindo
a sua detecao. Para concluir, sao tambem desenvolvidos os modulos necessarios para o reconheci-
mento de semaforos e sinais de transito baseado em abordagens do estado da arte, nomeadamente,
processamento de imagem e machine learning.
Palavras Chave
Conducao Autonoma, Localizacao, Percecao do Ambiente, Robotics Operating System.
v
Contents
1 Introduction 1
1.1 Purpose and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Goals and challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Document organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Related Work 7
2.1 Robot pose estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1.A Bayesian framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1.B Gaussian filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1.C Particle filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Sensors and techniques used for pose estimation . . . . . . . . . . . . . . . . . . 11
2.1.2.A Wheel odometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2.B Visual odometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2.C Inertial Measurement Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2.D Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Perception of the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1.A Types of representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1.B Static vs Dynamic environments . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Obstacle detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Traffic sign detection and recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3.A Color-based detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3.B Shape-based detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3.C Hybrid Color-Shape-based detection . . . . . . . . . . . . . . . . . . . . 18
2.2.3.D Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
vii
3 System Architecture and Components 20
3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Robot chassis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.2 Actuators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.3 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.4 Hardware interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.5 Computing platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 Software model of the robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.3 Software modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3.A Sensor calibration parameters . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3.B Ground, lines, and obstacles detection . . . . . . . . . . . . . . . . . . . 27
3.2.3.C Relative Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3.D Global Localization and Mapping . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.3.E Semaphore and Traffic Sign Recognition . . . . . . . . . . . . . . . . . . 28
4 Implementation 29
4.1 Hardware Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 Power regulation and interface with small sensors and actuators . . . . . . . . . . 30
4.1.2 Bridging between the low-level hardware and the computing platform . . . . . . . . 31
4.2 Sensor calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Low-level sensors calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.2 Kinect sensors calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Ground, lines and obstacle detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.1 Ground plane segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.1.A Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.1.B User-Defined Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1.C Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2 Road lines extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2.A Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.2.B User-Defined Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.2.C Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.3 Obstacle Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3.3.A User-Defined Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.3.B Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
viii
4.4 Relative Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1 Visual Odometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1.A User-Defined inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1.B Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.2 Sensor Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4.2.A User-Defined Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4.2.B Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.5 Global Localization and Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5.1 Global Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5.1.A User-Defined Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5.1.B Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5.2 Absolute Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5.2.A User-Defined Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5.2.B Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.6 Semaphores and Traffic Signs Detection and Recognition . . . . . . . . . . . . . . . . . . 48
4.6.1 Semaphore Detection and Recognition . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6.1.A Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.6.1.B User-defined inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6.1.C Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6.2 Traffic Sign Detection and Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5 Evaluation 52
5.1 Performance of the modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.1 Ground and obstacles segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.1.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.1.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.2 Road lines to virtual walls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.2.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.1.2.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1.3 Obstacle detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.3.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.3.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.1.4 Semaphore Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.4.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.1.4.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
ix
5.1.5 Traffic Sign Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1.5.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1.5.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1.6 Global Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.6.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.6.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.7 Global Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.7.A Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.7.B Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Performance of the system on the competition environment . . . . . . . . . . . . . . . . . 62
5.2.1 Robotics Open Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6 Conclusions and Future Work 64
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
A Printed Circuit Board 76
B Competition results 79
x
List of Figures
1.1 Track overview with zebra crossing, obstacle, and tunnel. Obtained from [1]. . . . . . . . . 3
1.2 Different instructions provided by the semaphores used in the competition. . . . . . . . . 3
1.3 Different traffic signs available on the competition. . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Proposal of functional architecture, including both Hardware and Software modules . . . . 21
3.2 Robot chassis overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 tf2 transforms of the designed robot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 Visual description and representation of the ground and obstacles segmentation algorithm. 36
4.2 Visual description and representation of the generation of the virtual laser scan from an
input colored point cloud. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Script interface used for the calibration of the CIELAB’s channels range values. . . . . . . 41
4.4 Visual demonstration of the output of the obstacle detection node, with red color, from the
input obstacles point cloud, with greed color. . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Binary templates used by the template matching method. . . . . . . . . . . . . . . . . . . 50
4.6 Visual demonstration of the YOLO implementation. . . . . . . . . . . . . . . . . . . . . . . 51
5.1 Comparison between the testing and the (real) competition environment. . . . . . . . . . 53
5.2 Visual demonstration of the number of points belonging to the ground when an obstacle
is at 10 cm from the car’s front . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 Visual demonstration of effect of distance in the number of points belonging to a track’s line. 56
5.4 Global map of the track captured by GMapping on the test environment. . . . . . . . . . . 60
5.5 Visual demonstration of path provided by the global localization, represented with the
green color, and relative pose estimation, represented with the red color. . . . . . . . . . . 62
A.1 Circuit board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.2 Circuit schematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
xi
xii
List of Tables
1.1 System specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.1 Ground and obstacles segmentation module performance on the different testing scenar-
ios on 10000 different point clouds for Scenario A and 1030 for Scenario B . . . . . . . . . 55
5.2 Results of the virtual walls node based on distance for 5000 iterations. . . . . . . . . . . . 57
5.3 Results of the obstacle detection node based on distance for 5000 iterations. . . . . . . . 57
5.4 Results of the semaphore detection node for the difference scenarios. . . . . . . . . . . . 59
5.5 Results of the obstacle detection node based on distance for 5000 iterations. . . . . . . . 61
xiii
xiv
Acronyms
2D Two-Dimensional
3D Three-Dimensional
ADC Analog to Digital Converter
AMCL Adaptative Monte Carlo Localization
CPU Central Processing Unit
DC Direct Current
EKF Extended Kalman Filter
GPS Global Positioning System
GPU Graphics Processing Unit
I/O Input/Output
I2C Inter-integrated Circuit
IMU Inertial Measurement Unit
IR Infrared
ISO International Organization for Standardization
KF Kalman Filter
LED Light Emitting Diode
LIDAR Light Detection and Ranging
MCL Monte Carlo Localization
OpenCV Open Source Computer Vision Library
xv
PCB Printed Circuit Board
PCL Point Cloud Library
PDF Probability Density Function
PF Particle Filter
PWM Pulse Width Modulation
RAM Random Access Memory
RANSAC RANdom SAmple Consensus
RC Radio Controlled
RGB-D Red, Green, Blue, and Depth
RGB Red, Green, and Blue
ROS Robotics Operating Systems
RViz ROS Visualization
SAE Society of Automotive Engineers
UART Universal Asynchronous Receiver-Transmitter
UKF Unscented Kalman Filter
USB Universal Serial Bus
VO Visual Odometry
YOLO You Only Look Once
xvi
1Introduction
Contents
1.1 Purpose and motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Goals and challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Document organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1
Introduction
1.1 Purpose and motivation
According to the European Commission, in 2016, 25.500 people died on European roads, more than
70 per day [2]. In addition, and as stated on the European Union ambitious Road Safety Programme [3]
for every death there are an estimated four permanently disabling injuries (brain or spinal cord damage),
eight serious injuries and 50 minor injuries. In a different study, the National Highway Traffic Safety
Administration, in 2015, stated that 94% of car crashes, solely in the United States were due to human
error [4].
Autonomous vehicles have proven to substantially increase road safety by decreasing not only the
number of car accidents and the number of deaths, but also improve traffic flow and efficiency. On the
one hand, it allows the reduction of congestion, pollution, the demand for well-defined roads, and the
excessive and expensive usage of road illumination. On the other, it increases the passenger comfort,
and it can bring independence back to the elderly and disabled people. Plus, the travel time can be used
to work or relax.
With this premise in mind, many car manufacturers have been releasing the first commercially avail-
able Driving-systems of Conditional Automation type, seen as the starting level system capable of sens-
ing the surrounding environment, as described in Society of Automotive Engineers (SAE) International
Standard J3016 [5].
Although it’s being witnessed a continuous technological advancements on a daily basis, there’s still
a lot to be done to provide commercially available fully-autonomous systems. In recent years, a lot of
work and research has been done not only by car manufacturers but also by the academia and other
institutions.
At a national level, for the past years, Sociedade Portuguesa de Robotica has organized the Au-
tonomous Driving competition from Portuguese Robotics Open [1]. Designed for students, this compe-
tition aims to promote the development of autonomous vehicles in a scenario that resembles an urban
road.
The purpose of this work is to develop on the state of the art literature of self-driving cars and create
2
an affordable solution that can be extrapolated to real-life scenarios, in an attempt to reduce the number
of car accidents and the associated injuries and deaths.
1.2 Problem statement
The objective of the Autonomous Driving competition from the Portuguese Robotics Open is to have
a fully autonomous robot performing a set of road challenges that are similar to real-life scenarios.
The competition’s environment is partially depicted in Fig. 1.1 and is comprised of a closed eight-
shaped track that fits on a 16.70 m x 6.95 m rectangle and with two intersections, a zebra crossing,
traffic lights, vertical traffic signs, and parking area.
Figure 1.1: Track overview with zebra crossing, obstacle, and tunnel. Obtained from [1].
From the first to the last rounds the assigned tasks increase in difficulty. It starts as a pure speed run,
with just a start signal and, from one round to another, different challenges are inserted. The challenges
come in various forms and vary from respecting the instructions provided by semaphores, recognizing
vertical traffic signs, perform parallel and perpendicular parking, safely avoiding obstacles, and moving
under diminished light (tunnel) and in under-construction areas.
The semaphores and their instructions are presented in Fig. 4.5.
(a) Left (b) Forward (c) Right (d) Finish (e) Stop (f) Parking
Figure 1.2: Different instructions provided by the semaphores used in the competition.
3
The different traffic signs that have to be recognized are depicted in Fig 1.3.
(a) Bus Stop (b) MandatoryMediumLights
(c) Roundabout (d) MandatoryLeft
(e) Hospital (f) Park (g) PedestrianCrossing
(h) RecommendedSpeed
(i) Animals (j) Depression (k) Lane Sup-pression
(l) Caution
Figure 1.3: Different traffic signs available on the competition.
Given the broad scope of the areas concerning the competition challenges, the full extent of the
problem is split into two parallel theses, developed at the same time. The present thesis focuses, on
the hardware part, on the electronics of the robot and on the necessary software modules to model the
environment, infer the absolute localization of the robot, and detect and recognize the objects placed in
the track. This information is to be utilized by the navigation and decision-making modules, developed
on Francisco’ thesis [6].
1.3 Goals and challenges
The environment in which the competition is inserted has a mixture of both indoor and outdoor
characteristics. As it takes place inside a warehouse, Global Positioning System (GPS) is not available,
resembling an indoor environment. However, the track is placed on a flat open space area without any
4
near walls, resembling an outdoor environment.
The objects inserted on the track for the different challenges, e.g., obstacles and traffic signs, do
not have a fixed position and, consequently, cannot be used for global localization purposes. The only
available information for absolute localization is the road’s landmarks. Due to the lack of fixed Three-
Dimensional (3D) objects, Light Detection and Ranging (LIDAR) sensors are practically unusable for
localization purposes since those only perceive distance, not colors. In these conditions, the use of
vision is almost mandatory.
Although the dimensions of the track are known, as it is constructed manually and since the car-
pet can fold, the exact location and dimensions of the road’s landmarks may be subject to errors. As
described further in this document, an accurate map is needed for an accurate position estimation. Man-
ually modeling the map is very time consuming and error-prone. Since the amount of time before the
competition is very limited, the map needs to be constructed automatically by navigating the track before
or during the competition.
As described in the competition’s rules [1], each round is scored according to an equation that takes
two aspects into consideration: time of execution and penalties. Penalties are provided for infractions
committed during each challenge. This forces all competitors to create driving-systems that are both
reliable and fast, which also imposes a real-time constraint. Taking those aspects into consideration and
adding the motivation to address a real-world problem with a limited amount of resources, it was decided
to prototype a non-holonomic car-like robot.
Considering the competition rules, the experience from previous performances and the restrictions
in the available resources, a set of specifications for the robot are defined. These are outlined in Table
1.1.
Table 1.1: System specifications.
TargetSpeed
Positionaccuracy
Orientationaccuracy
Obstacledetectionaccuracy
Obstacleidentificationrate
Poseestimationrate
2 m/s ±0.025 m ±5 º ± 0.05 m 5 Hz 10 Hz
The factor velocity is key for the competition. The highest speed from previous competitions was
approximately 2m/s. Thus, it’s set as target. This also sets the real-constraints for the system and
developed modules.
The position accuracy is determined by the competition rules. At the end of a challenge, the au-
tonomous vehicle must stop with its front-most part within a 5 cm thick line, hence the defined accuracy.
The remaining specifications are defined to ensure that the navigation and decision-making modules
have the necessary accuracy and rate to drive the car at the imposed target speed and quickly react to
the challenges ahead.
5
1.4 Original Contributions
A list with the original contributions of the present thesis are provided below:
• Dimensioning and design of the necessary hardware components that support and power the
autonomous vehicle and serve as a basis for an academic research platform. This includes re-
viewing the state-of-art sensors and actuators and it’s applicability on the competition environment
as well as the necessary hardware to bridge the low-level sensors with the Robotics Operating
Systems (ROS).
• Implementation of a novel algorithm that simulates virtual walls from the track’s lines enabling its
usage with Monte Carlo Localization (MCL) algorithms.
• Adhibition and customization of the state-of-the-art algorithms used for mapping and localization
of a robot.
• Implementation of state-of-the-art methods for semaphore and traffic sign detection, which range
from image processing to machine learning techniques.
1.5 Document organization
With the motivation, goals, and contributions of the present thesis described, it’s presented how the
document is organized in the following chapters, namely:
Chapter 2 reviews the state of the art literature that is applied in the mobile robotics fields, which can
also be applied on the context of the Robotics Open’s Autonomous Driving challenge.
Chapter 3 presents a proposal of the hardware and software components that are used and imple-
mented on this work. The system architecture and contribution of each component to the end goal
is also described.
Chapter 4 outlines the design implementations of each individual hardware component and software
node. For the software nodes, the technical aspects of each algorithm are also described in more
detail.
Chapter 5 describes evaluation methodologies carried to determine the performance of each individual
node. On a broader scope, the input of each module to the end goal is characterized along with
the performance on a real scenario: the Robotics Open’s Autonomous Driving competition.
Chapter 6 summarizes the accomplishments of the present work and provides an overview of the
improvements and future work that can be done using this thesis as a basis.
6
2Related Work
Contents
2.1 Robot pose estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Perception of the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7
Related Work
2.1 Robot pose estimation
2.1.1 Localization
A robot’s localization is defined as the problem of estimating its pose relative to its surrounding
environment. The robot pose is composed of its location and orientation relative to a global coordinate
frame. Therefore, localization is viewed as a coordinate transformation problem.
In a real scenario, there is no way of obtaining the pose directly from a single sensor measurement,
according to Thrun et al. [7]. As stated by the authors, the culprit for such statement is a key element of
robotics: uncertainty.
Virtually all mobile robot localization literature makes use of probabilistic laws to account for un-
certainty. This comes from the fact that, for a robot to be able to perform its task without any human
intervention, it needs to deal with its uncertainty [8,9].
In probabilistic robotics, to cope with uncertainty, instead of a single true state hypothesis, the state
is represented by a Probability Density Function (PDF) over the possible values.
The PDF of state is coined in [7] as the robot’s belief. Belief can be seen has the robot’s internal
knowledge about the state of the environment from all past observations and controls while assuming
their uncertainties. Hence, belief of a state variable x at time t, xt, is defined by the following equation:
bel(xt) = ρ(xt|z1:t, u1:t), (2.1)
where z1:t represent all past sensor measurements and u1:t all past controls. The reason behind the
distinction between control and sensor data will be covered further ahead.
To calculate belief distribution, the most widely used framework is the Bayesian framework. It’s
interesting to see how even the most recent literature is still based on this framework, 50 years after it
was introduced by Sorenson and Stubberud [10] for approximation of the a posteriori density function.
8
2.1.1.A Bayesian framework
The Bayesian theorem has served has a basis to the majority of localization algorithms that have
been introduced, thus, understanding its concepts is essential. Bayes filter performs state estimation
recursively and each iteration is composed of two steps: prediction and correction [7].
Prediction or control update attempts to estimate the current state using only the controls provided to
the robot, before the sensor measurements are included. To distinguish from the posterior, bel(xt),
it is usually defined as bel(xt) by the equation:
bel(xt) =
∫ρ(xt|u1:t, x1:t−1))dx. (2.2)
Correction or measurement update takes the sensor data to tighten the width state PDF, i.e., increases
the overall certainty of the robot. This is accomplished by multiplying the a-priori belief, bel(xt), by
the measurement probability. The measurement probability describes how the measurement z is
generated from the environment state x. This steps is expressed by the function:
bel(xt) = p(zt|xt)bel(xt). (2.3)
Due to its complexity, the original algorithm can only be implemented in very simple estimation prob-
lems. To deal with realistic scenarios, numerous approximations of the Bayes filter algorithm were
introduced. The most frequently used derivations of it are Gaussian and Particle filters.
2.1.1.B Gaussian filters
Gaussian filters, as the name implies, approximate the posterior PDF by a multivariate normal distri-
bution. This means that the posterior is solely described by its mean, µ, and covariance, σ.
The most notable and studied implementations of Gaussian filters are the Kalman Filter (KF) and
its variants [7]. The KF was introduced in the decade of 1950 and is named after the author, Rudolph
Kalman [11]. It is a technique for filtering and predicting linear systems. Although it was not originally
designed for robotics applications, this method has been heavily studied in the field for its ability to
implement the Bayes filter [12, 13]. Although it is applicable to very simple linear problems, in practice
there are very few occasions where the systems are linear, rendering the linear KF unusable in most
of real-world systems. To tackle this limitation, various extensions of the original Kalman Filter were
introduced.
The Extended Kalman Filter (EKF) [14] is able to overcome the linearity assumption of regular KF by
linearizing the system using first order Taylor expansion. By performing linearization, the computational
9
complexity of the system is reduced. The EKF has shown good practical results when systems are
approximately linear [15]. For that reason, it has become one of the most popular methods for state
estimation [7]. Nevertheless, since it is based on linearization, when systems are highly non-linear
the EKF approximation presents a significant amount of error. In [16], it was proven to be its major
weakness.
With this in mind, Julier et al. [17] introduced the Unscented Kalman Filter (UKF). Its name derives
from the used sampling technique, the unscented transform. To approximate the non-linear system, a set
of sample points, denominated by the authors sigma points, are chosen around the state mean. These
points are then applied to a non-linear function, from which a new mean and covariance is deduced.
As in all KF approaches, it only solves what is known in the literature as position tracking or local
localization problem [18]. This is a consequence of representing belief using a normal distribution, as it
is unimodal and, therefore, confined to a single peak. Also, those rely on the assumption that the initial
state of the robot is known and the uncertainty is always kept within a small amount [7].
To deal with global localization and the kidnapped robot problem, multi-modal approaches were
introduced. In the next section, this new set of filters is further described.
2.1.1.C Particle filters
Particle Filter (PF) were not the first multi-modal approach to localization. The first multi-modal ap-
proach was Grid-based Markov localization [19]. However, in [20]. PF were proven to be more memory
efficient, faster, more accurate, and easier to implement. For that reason, although it is historically
important, it is not covered on this document.
PF name and success comes from adding a set of randomized samples, also called particles, during
the prediction step [20]. This way, instead of having to reason about the probability distribution of state, it
simply tries to randomly guess a possible one in a way that favors likely positions over unlikely ones. Due
to that, it was even called survival of the fittest algorithm [21]; although it is mostly known as MCL [22]
or condensation algorithm in computer vision literature [23].
The key idea of PF is to have N random particles that possess information about the robot’s pose
along with a weight that is related to the probability of that being the correct pose. The advantage of
Particle Filters over Gaussian ones it that the initial position does not need to be known. Due to its
survival of the fittest algorithm, it converges to the correct pose at a rate of 2√N . This means that, the
bigger the number of samples, the fastest it converges. However, this naturally comes with a higher
computation cost too.
Although regular MCL has advantages over other multi-modal approaches, it has its pitfalls too.
Since the particles are inserted randomly, it might happen that none of the generated particles is near
the correct pose. To this effect, it is given the name of particle deprivation problem [7].
10
Another disadvantage (and a highly counter-intuitive one) is that regular MCL becomes arbitrarily in-
accurate the more accurate sensors are. To picture this, one must image a scenario where the sensors
have no noise. In such conditions, the state probability will be 0 to all cases except to the single dirac
value that actually corresponds to the true state. When generating samples, it will be very unlikely to
generate a sample that matches exactly the robot’s true pose. Consequently, when assessing the im-
portance of the samples, all will have arbitrarily low weight unless, by accident, one actually matches the
true pose. Although this is an hypothetical scenario and there are no noise-free sensors, it exemplifies
the problem in regular MCL. It arises from the fact that sensor measurements are not taken into account
in the prediction step.
A variant called Mixture-MCL [24] claims to mitigate the aforementioned problems by using MCL com-
bined with its “dual”. The “dual” of MCL can be seen as an order inverted MCL, i.e., it first incorporates
sensors measurements to guess poses and, only then, it takes control data to validate the compliance
of the guess. In [24], it is stated that neither MCL or dual MCL alone provide a good guess, but when
together, these can achieve much better results for a smaller number of samples. As described, the
computational cost increases with the number of samples N, thus, it is preferable to keep the value as
small as possible.
Also with this idea in mind, the Adaptative Monte Carlo Localization (AMCL) [20] was introduced.
The key idea in this approach is that the number of sample varies depending on the uncertainty of
the robot. Whenever the probability associated to all samples is small, it means that the certainty that
those correspond to a correct pose is also small. For that reason, to make it converge faster, the
number of samples is increased. It can work the other around, i.e., when there are too many high
probability samples, the number of samples is reduced. This way, computational resources can be
efficiently managed without compromising the overall accuracy.
More recently, in 2009, an approach called Self-adaptive Monte Carlo Localization [25] was intro-
duced to deal with the randomness problem described. In order to reduce the computational burden of
traditional MCL approaches, the authors added the concept of Similar Energy Regions. Similar Energy
Regions are calculated when the map is built and pre-cached before the localization process starts. The
key difference is that, instead of adding samples randomly in the map, these are added in Regions that
have increased likelihood of being the regions the robot is. By adding samples in smaller regions, the
number of samples can be reduced, consequently decreasing the computation cost.
2.1.2 Sensors and techniques used for pose estimation
To decrease the overall uncertainty associated to the robot pose estimation, accurate sensor data
is required. To achieve this, many authors have used different sensors and techniques to increase
the accuracy of their systems. With this in mind, the literature’s most commonly used sensors and
11
methodologies that can be applied in the competition’s premises are reviewed.
2.1.2.A Wheel odometry
The most widely used sensor in mobile robotics is, incontestably, wheel odometry. This classical
technique is used in wheeled robots, bases on the assumption that the wheel revolutions are translated
to a linear movement in the vehicle. Since the majority of mobile robots are wheeled, it is normal that it
is so vastly used.
Odometry is calculated incrementally and, due to its cumulative nature, it is very sensitive to errors.
In the literature, two distinct types of errors are described: systematic and non-systematic [26].
Systematic errors are based on the intrinsic characteristics of the robot, according to Borenstein
and Feng [26] studies. As pointed by the authors, these are mainly related to different wheel radius
and wheel misalignment. In order to compensate those, the authors proposed a calibration technique,
named UMBmark, that allows the compensation of the systematic errors.
Non-systematic errors can be interpreted as random perturbations to a robot’s position. These are
usually associated to wheel slippage, wheel backlash, or uneven floor. In an attempt to mitigate its effect,
Wang [27] modeled the perturbations using the mean vector and covariance matrix to compensate its
errors. However, modeling the error is not a trivial task.
For that reason, more recently, Doh and Chung [28] proposed a method that estimates the error
parameters and covariance matrix. The method consisted on navigating the same path twice when sys-
tematic errors are fully compensated, being the difference of the two odometric paths the non-systematic
errors. The authors claim numerous advantages over previous techniques, namely: better results than
the previously most used technique, UMBmark; the model could be extended to all mobile states such
as the heading angle of the robot; and it can be applied to different kinematic systems.
2.1.2.B Visual odometry
Since the use of vision is practically mandatory for this challenge, as described in Section 1.3, it can
be use as an alternative to wheel odometry.
The concept of visual odometry is simple: estimate the motion from consecutively captured images.
Since it incrementally attempts to estimate the pose of the camera, similarly to wheel odometry, it has
been coined as Visual Odometry (VO). In the literature different types of cameras have been used for
this purpose. Their strengths and weaknesses are hereby resumed.
Classical techniques simply use a monocular camera [29]. However, since monocular cameras are
not aware of scale nor depth, the classical technique performs worse that the remaining candidates, so
it will not be focused.
12
Nister et al. [30] proposed a system that, contrary to the previous one, uses more than one camera.
With this approach it is possible to estimate the 3D position of the landmarks from consecutive images,
rendering it very accurate. The authors have shown accuracies of 1 % over the path’s length, which are
considerably accurate for a camera based odometry. One of the benefits of the visual odometry over
wheel encoders is that it is not affected by the typical wheeled odometry sources of error, namely, wheel
slippage and skid. However, having more than one camera renders the setup hard to calibrate and it’s
computationally heavier since their algorithm was capped at 13 Hz.
Steinbrucker et al. [31] presented a novel approach to visual odometry using Red, Green, Blue, and
Depth (RGB-D) sensors like the Microsoft’s Kinect. Since depth data is a directly output by the Kinect
sensor, this approach stands out by its great accuracy and its low computational effort, when compared
to previous stereo camera approaches. The authors accomplish these by means of energy minimization
to estimate camera motion between consequent RGB-D images.
2.1.2.C Inertial Measurement Unit
An Inertial Measurement Unit (IMU) uses a combination of sensors that allows the measurement
of actuation forces and angular velocities of a body. The sensors generally comprised within theses
units are accelerometers and gyroscopes. The combination of these two provides a reliable source
of orientation, however, since both sensors are relative, the orientation tends to drift over extended
periods of time [32]. This drift can be fully compensated for if an absolute source of orientation is
added. By measuring Earth’s magnetic field, a compass is capable of contributing with the necessary
absolute source of orientation and is commonly included on IMU to minimize the drift into an insignificant
value [33].
2.1.2.D Sensor Fusion
In the literature, the fusion of odometry sources, being those wheeled of visual based, with inertial
data is proven to be a more accurate approach than using the different types of data separately [32,34–
37]. This comes from the fact that odometry and inertial data are complementary of each other.
Even though the used sensors may differ, all authors have one thing in common: the use of the KF or
one its variants to mix different types of data. Due its small computational requirement, elegant recursive
properties, and ability to mix data at different frequencies, it is the most widely used algorithm for the
task of sensor fusion. These are already described in the Section 2.1.1.B.
13
2.2 Perception of the Environment
2.2.1 Mapping
A map is defined as the robot’s representation or knowledge of its surrounding environment. The in-
formation it carries is crucial for an autonomous vehicle to estimate its position relatively to the observed
world. It has been considered in [38] as one of the most important aspects to achieve truly autonomous
mobile robots. Due to its importance, different types of representations have been studied over time.
These highlight different aspects about the environment that are necessary for the robot to fulfill its mis-
sion. Likewise, to accomplish the navigation safely, it needs to represent not only ego-position while
also non-transposable objects’ position. In this section, the distinct ways of representing the knowledge
about the environment and its different variables are enumerated and hereby summarized.
2.2.1.A Types of representation
Topological maps are a semantic world representation, usually represented as a connected graph
[8, 39–41]. In this kind of representation, the nodes correspond to places of interest while the
connections between the nodes correspond to the places extremities. In order for a robot to localize
itself, it needs to uniquely identify the places via sensorial input. This representation allows an
abstract relation between node, which can reduce the complexity of trajectory planning tasks.
However, pure topological approaches avoid as much as possible geometrical or geographical
relations between nodes [42]. Without resorting to any form of metric information, performing
safe navigation on obstructed sites is nearly impossible [43], being this the major weakness of
topological maps.
Metric maps represent the environment as a geometric relation between objects and a fixed reference
frame. The geometric relation is metric, hence the name. Metric maps are further divided in two
main categories:
• Feature maps represent the environment with a collection of geometric shapes [44,45]. The
mapped shapes can go from corners or edges to more complex shapes like blobs. To create
the map, features are observed and compared to previously stored ones. Due to this fact,
Bailey [43] concluded that this mapping approach is very sensitivity to false data association.
However, since the environment is reduced to its descriptors and these hold very little infor-
mation - position and shape, for instance - feature maps are computationally light and efficient
when compared to other representation forms.
• Occupancy grids got its name by the way the environment is represented - in a grid. In the
classical occupancy grid approaches [8, 46, 47], the grid size is fixed. Within the grid, each
14
cell represents a region of the environment. These cells contain the probability of being occu-
pied, ranging from 0 to 1, representing unoccupied or occupied space respectively. Because
it directly carries information about the transposable and non-transposable space, this form
of representation is able to reduce the complexity of navigation and decision-making mod-
ules and has proven to be more resilient to misreadings than feature-based mapping [43].
The major setback of traditional occupancy grids is its trade-off between accuracy and com-
putational effort. Maintaining a large occupancy grid can be computationally heavy. Arleo
et al. [48], introduced a novel adaptive grid-size technique to mitigate the impact of storing
and processing high resolution grids. This way, cells can be kept at lower resolution. When
needed, each cells can be further subdivided in a hierarchical tree-based form. Due to its
form, it was later coined as quadtree [49] or octotree [50] for Two-Dimensional (2D) and 3D
environments respectively.
Appearance Maps describe the environment directly from raw sensorial data. The classical methods
manly focused on laser’s range data [51–53] but it was later extrapolated to vision sensors [54–56].
As metric approaches only use part of the sensed data to perceive the environment (features, for
instance), by using sensors’ direct output a much richer description of the environment can be
obtained [43]. The mapping characteristics of appearence based approaches are very similar to
topological ones. Instead of recognizing that the robot is inside of a place (topological map), in
appearance maps the pose of the robot is directly extracted from sensor data. To accomplish that,
a function to match data from data needs to be created. In a real scenario, creating this function
can sometimes be an impossible task. This is true, for instance, in long or symmetric corridors,
which is happens to be true in the competition’s track.
2.2.1.B Static vs Dynamic environments
Beside the form of representation, an environment is also be characterized by the number of state
variables. State can be seen as the collection of all aspects of the robot and its environment that can
impact the future.
An environment is defined as static when the only variable is the robot’s pose [43]. This means that
all environment’s objects will always be in the same place. In the other hand, when the environment’s
knowledge changes over time (either objects are inserted, removed, or moved), the environment is
defined as dynamic.
Most of the classical mapping techniques assumed the environment to be static. In real-world envi-
ronments, this assumption is very unlikely to be true. Wolf and Sukhatme [57] proposed a separation
between the static and dynamic objects on the map. Later, the same authors shown in [58] an online ap-
plication of this method to localize the robot in a static map and navigating without collisions by avoiding
15
the objects in the dynamic map.
2.2.2 Obstacle detection
In the literature, mainly two types of sensors are used for the purpose of identifying obstacles: cam-
eras and laser scanners. Vision-based sensors have proven to be a much cheaper alternative to LIDAR-
based ones and, due to the cost limitation, state of the art algorithms using laser scanners will not be
covered.
Since roads are approximately flat, obstacles are usually defined as objects that have height. As
such, non-transposable terrain can be identified by detecting objects that stand on the ground plane
but do not belong to it. In computer vision literature, non-transposable terrain is commonly obtained by
performing image segmentation of the different planes.
In [59], a monocular camera is added to a car to detect road obstructions. The proposed system
works in three steps: depth map computation, obstacle extraction, and data fusion. Since a monocular
camera do not posses depth information, the depth map is computed from multiple consecutive frames.
Then, by assuming that the car is moving on a plane, any object lying on it is an obstacle. By fusing the
odometry data with the depth information, an occupancy grid map is generated and the non-transposable
objects are added.
To decrease the complexity of the system and the computational burden of calculating the depth
map from consecutive images, Fleischmann and Berns [60] used a stereo vision approach. The authors
demonstrate the applicability of a point cloud to distinguish between soft objects that are found in the
road, like grass and weed, from dangerous solid objects that cannot be trespassed. It is achieved by an-
alyzing the density and distribution of the points captured and the normal of the fitted plane. Performing
neighborhood analysis to those clusters allow its classification - either obstacle or not obstacle.
In the same way, Holz et al. [61] also identify the planes correspondent to the navigable surfaces
while identifying its obstacles. In their research, they take advantage of the Microsoft’s Kinect sensor to
combine both Red, Green, and Blue (RGB) images with depth information to directly form a point cloud.
Kinect’s advantage comes from its cost, performance, and ease of use and configuration. Contrary to
stereo systems, it takes little to no effort in setting up and is computationally faster since the point cloud
does not need to be computed. Their research demonstrates how to segment different lanes to obtain
the navigable surface at frame rates up to 30 Hz.
Inspired in the latter work, Peasley and Birchfield [62] demonstrate the applicability of the Kinect’s
depth information to generate in real-time an occupancy grid with the transposable on non-transposable
terrain.
16
2.2.3 Traffic sign detection and recognition
Due to its applicability in real-world scenarios, a high amount of research for traffic sign detection
and recognition has been conducted. For the detection part, authors focus mainly in two aspects of the
visual information: colour and shape. These aspects are naturally focused as the colour is used to call
human attention and the shape is used for its categorization. In the recognition part, the detected signs
are then compared with the set of all the traffic signs for a potential match.
In this section, the literature on both color and shape-based detection types and recognition tech-
niques are reviewed.
2.2.3.A Color-based detection
Cameras commonly output visual information in the RGB color space. The earliest techniques [63]
instinctively make use of the direct output of cameras to segment the image into potential sign candi-
dates. These regions of interest are selected according to their color information. Since parts of the
direct output of the cameras are selected according to certain color thresholds, this technique is compu-
tationally light. However, the values of Red, Green, and Blue pixels depend on the intensity of the light,
rendering the technique very sensitive to changes in ambient light conditions.
To overcome this dependency, the RGB image is usually converted into a different color space. Other
color spaces that have been used in the literature are YCbCr [64], XYZ [65], and HSI [66]. Although there
is no de-facto standard for the color space used in the literature, the most commonly used ones are the
hue-saturation color spaces, HSI, HSL, and HSV, according to Fu et al. [67] survey. Its popularity was
attained because it is based on human color perception. Recent studies continue to show interest in
human color perception and even more complex color spaces are being utilized, like CIELAB, CIELUV,
and CIECAM [68]. The latter has displayed up to 8 % better results in different weather conditions when
compared to HSI.
Besides color-thresholding, region growing techniques [69] are also used to identify hypothesis re-
gions. As traffic signs are generally bounded by a color, this algorithm takes advantage of this and
attempts to identify closed colored shapes. If the closed condition is not met, the region is not consid-
ered.
Added to the weather and daylight conditions, outdoor environments have a high amount of occlusion
and reflections that can alter the color of the observed signs. Since color information cannot be retrieved
in some occasions, shape-based techniques have been studied as an alternative to color-based detec-
tion.
17
2.2.3.B Shape-based detection
Given that the geometric form of the traffic signs is well-known, many authors proposed detection
techniques based on their outline. The first attempt to extract the object features used the Hough Trans-
form [70]. This transform is capable of detecting regular shapes such as lines or circles. Since it uses
gray-scale images, its more immune to variations in the light. Yet, Hough Transform is computationally
complex and memory hungry [71].
To reduce the computational complexity, an edge detector pre-processing of the image is used. The
most commonly used edge detectors in the literature for this purpose are Sobel [72], Laplacian [73],
and Canny [74]. Sobel and Laplacian detect edges based on the gradient of pixels from the first and
second order derivate, respectively. Canny first uses a filter to reduce noise, then applies a Sobel, and
to calculated the best pixel for the edge it applies a on-maximum suppression in the local neighborhood.
Due to its computational complexity, many authors found alternatives to the Hough Transform.
Barnes et al. [75] have introduced a radial symmetry detector to detect Australian circular signs
in real-time. As the name implies, with this technique, the only regions of interest are the ones that
have radial symmetry. It was later extended [76] to other also detect rectangular shapes. This method
displayed 95 % accuracy and to be very tolerant to in-plane rotations of the signs, i.e., rotation in the
plane of the image.
To deal with scale variation Perez and Javidi [77] have presented a detection mechanism based on
the composite filter and the filter bank. Later, the same authors released a composite filter bank which
added the ability to detect signs with out-plane rotation, i.e., with rotations out of the plane of the image.
Although shaped-based methods have advantages over color-based ones, these also have their
own set of difficulties. As presented, to detect shapes edge detection and matching algorithms are
needed. This makes it difficult to detect shapes, when signs are distant or the image size is small.
Also, in cluttered scenes, lots of false positive shapes are detected, affecting the accuracy of the sign
recognition system.
2.2.3.C Hybrid Color-Shape-based detection
As the color and shape informations are complementary, it is natural that authors combine them. The
advantages of each have been applied in the most recent works to increase the overall accuracy.
Nakamura et al. [78] proposed a system that first segments the image with color information and
then performs a shape analysis. In their work, the color segmentation is performed in directly from RGB
images. Nonetheless, by applying a Neural Network to detect Ring shapes the authors claim 100 %
detection accuracy.
18
2.2.3.D Recognition
For the signs to be fully identified, after the shape and/or color is detected, the interior part of the
sign must be analyzed. This process is known as recognition. In the literature, the most commonly
used approaches for the purpose of recognition are template matching [74, 79–83], neural networks
[63,78,84,85], and, more recently, support vector machines [86–90].
Template Matching techniques guess the similarity between the identified regions and the sample
images provided beforehand. The templates are images or a derivate of it, e.g., histogram or
pattern. Typically a normalized cross-correlation method is utilized [79]. Due to its comparative
nature, this technique is very sensitive to the effects of reflection and occlusion. To mitigate this
effect, Paclik et al. [91] proposed a comparison of local features only. Despite this breakthrough,
as to the majority of template matching techniques, if the signals presents rotation, the templates
must also include the rotation for an accurate identification. The latter can create an overhead in
the time required to generate the templates as more rotated samples are required. Consequently,
the time to process through all of them increases.
Neural Networks are inspired in the biologic neural systems and have also proven to be an useful
alternative to the recognition task. When compared to template matching, it is capable of more
accurate results. Very recently, in 2017 Shustanov et al. [85] demonstrated the appliance of a
Convolutional Neural Network to achieve 99.94 % accuracy. Not only where the authors capable
of reaching almost 100 % accuracy while also doing it at 50 Hz. Even though the success rate
is very high, the training overhead represents a significant burden to the implementation of neural
networks, being its biggest disadvantage.
Support Vector Machines are supervised machine learning algorithms that are usually used for re-
gression or classification tasks. Since it is supervised, for the algorithm to properly work, it needs
to be trained. The training part generally creates a big overhead on the time of implementation
that is proportional to the number of signs to recognize. Nonetheless, when properly trained,
these algorithms have proven to be the most accurate classifier with 100 % accuracy [87]. Al-
though the results are astonishing, its major weakness when compared to other methods is the
achieved frame rate [85]. Only very recently, in 2012 [90], was it possible to achieve a real-time
implementation on a real dataset.
19
3System Architecture and Components
Contents
3.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
20
System Architecture and Components
In this chapter, all modules belonging to the system are described: their functionality to the whole
system along with the proposal of architecture. It is also explained how the hardware and software are
interconnected and how these fit in the global picture. A high-level block diagram depicting the covered
modules and their connections is shown in Figure 3.1.
Figure 3.1: Proposal of functional architecture, including both Hardware and Software modules
21
3.1 Hardware
The hardware comprises the mechanical and electronic components that are arranged inside the
robot frame. These components have to be within the robot frame in order to be accordant with the
competitions rules [1]. Thus, given the limited available space inside the car-like robot, the constituents
of the car are chosen over three main characteristics: price, size, and performance/efficiency. Below,
each hardware unit that composes the autonomous vehicle is listed with the rationale for its inclusion.
3.1.1 Robot chassis
As described in Section 1.3, it’s opted for a non-holonomic robot to mimic as much as possible a
real-world scenario. The four-wheeled frame that is used in the competition is a 1:8 miniature of a real
car and is depicted in Figure 3.2. It has the steering system on the two front wheels, wheel suspension,
and a mechanic differential per axle.
Figure 3.2: Robot chassis overview.
The chassis was kindly donated to us and is originally from a Radio Controlled (RC) car with a gas
motor. To convert the old gas motor to electric one and support the hardware, numerous 3D printed
parts were created. All of those parts were developed in the context of the other parallel thesis while this
thesis focuses on the integration of all sensors, which will be described further below.
22
3.1.2 Actuators
For the robot to achieve its basic functionalities while also acting like a real car, a set of actuators
were added to the frame, namely:
• Motor - To replace the old gas motor an electric motor was chosen. The electric motor was chosen
as it allows a more accurate speed control, it reduces the complexity of its control, and it’s more
energy efficient. The motor in use is a brushed Direct Current (DC) motor. When compared
to brushless triphasic motors, brushed motors spend less current and are also cheaper at the
expense of top speed. Nevertheless, since the brushed motor is capable of achieving the target
speed of 2 m/s, one is incorporated in the robot.
• Steering - Contrary to conventional motors where the signal controls velocity of the rotor, servo
motors control the position of the rotor according to the input. For that reason, the use of servos for
direction control is standardized across the majority of non-holonomic frames. Since the chosen
chassis already included a support to hold a servo, the literature standards were followed.
• Braking - Again using a servo, by controlling the position of the mechanical parts, it’s possible
to lock or unlock a free-spinning disk that is mounted on the transmission. In the same way as
the steering servo, this is a standard across the majority of the radio-controlled car-like frames
and since the mechanical part is not the focus of the thesis, the standards were followed. The
necessary mechanical disks and support to add to the transmission had to be bought separately,
as those were not included in the robot chassis.
• Lights - Similar to a real-life scenario, when moving under diminished light environments, the
driver is obliged, by law, to turn on the lights. This way, it can clearly visualize the road ahead while
warning other drivers about its presence. To address this problem having a real-world situation
in mind, a pair of lights were included in the robot frame. Light Emitting Diode (LED) lights are
chosen over halogen ones as those are more energy efficient and provide enough luminosity for
the cameras to work inside a tunnel without any source of light, besides the robot’s lights, inside it.
3.1.3 Sensors
• RGB-D camera - To have both color and depth information an RGB-D sensor is used. Its use is
twofold: it’s easy to extract the road’s landmarks from the data of a colored image and its depth
information can be used to get an accurate position of the landmarks. Depth can also be used to
segment the colored image’s road from its impassable obstacles and the color of the objects used
to recognize them. A Microsoft’s Kinect One was added to fulfill this role due to its cheap price,
23
its depth accuracy, and low calibration complexity when compared to other RGB-D cameras and
even stereo cameras.
• Wide-angle camera - Since Kinect’s field of view is fairly limited - 70 º in width by 60 º in height,
according to Microsoft’s documentation - a wide-angle lens camera was used to capture the
semaphores and traffic signs. The IDS camera that is currently being used for this purpose, was
previously used as the only vision sensor. To fill the lack of field of view, this camera is reused.
• Stereo Camera - Due to its characteristics, a ZED stereo camera is used as the main sensor to
estimate the pose of the vehicle. Using their proprietary VO algorithm, the creators specify a very
high accuracy of up to 0.1 % of the traveled path. Since the robot chassis in use has a suspension,
the installation of wheel odometry is a very complex task. Additionally, although an encoder could
be coupled to the motor, the gaps on the mechanical axle responsible for turning the wheels can
cause a difference on the angle of the wheel of up to 5 º for the same steering servo angle. Since
the mechanical component of the robot is not the focus of this thesis, wheel odometry is discarded.
• Inertial measurement unit - As described in Section 2.1.2.C, to complement the information pro-
vided by odometry (being it visual or wheel odometry) an inertial measurement unit is also incor-
porated in the robot. It comprises a 3-axis gyroscope, 3-axis accelerometer, and even a 3-axis
magnetometer. This combination provides an accurate and cheap way of measuring the robot
orientation according to the world. The selected IMU is MPU-9250 as it is considered one of the
most accurate sensors for its price range.
• Infrared sensors - As in the most recent parking systems that inform the driver about the distance
to obstacles when the vehicle is moving backward, three infrared sensors are placed on the rear of
the vehicle. These cover the needed angles for the car to perform the parking maneuvers without
incurring into any collision.
3.1.4 Hardware interface
• Microcontroller - To make the bridge between the low-level sensors and actuators - basically all
hardware units descript with the exception to the cameras - and the computing platform where the
sensor data is processed and navigation commands are calculated, a microcontroller is used. Due
to its low-price, open-source community, built-in programmable port, and high amount of memory
an Arduino Mega was chosen over the remaining microcontrollers.
• Microcontroller shield - A custom Printed Circuit Board (PCB) is designed to facilitate the con-
nection and even support some of the sensors, actuators, and power regulators. The circuit directly
mounted over the Arduino pins, which is known in the Arduino community as a (Arduino) shield.
24
The advantages of using a PCB are threefold: it has the necessary connectors for sensors and ac-
tuators to be easily plugged or unplugged, if needed, it holds and connects the smaller electronic
components, and it regulates and distributes the battery energy to the components that require an
external source of energy.
• Motor driver - As previously stated, a DC motor is used to make the robot move. The majority
of microcontrollers typically only work with 5 V or 3.3 V logic pins that draw very low currents.
As the motors work in higher voltages and require a high current draw, a driver to adjust the
microcontroller signal output to the required motor input signal is used.
3.1.5 Computing platform
A mini-ITX format computer is specifically added to fit the space and cost requirements. Compared
to a laptop, this minimized form computer can reduce the overall cost and space requirements without
compromising performance. In fact, since laptops have lots of components that are not needed for the
competition, e.g., keyboard, screen and body frame, the monetary resources can be applied where
needed. As the key factor of the competition is speed, the resources are allocated to meet the high
performance requirements. This includes not only a performant Central Processing Unit (CPU) but
also a performant Graphics Processing Unit (GPU). The idea is to exploit the graphics cards’ number
crunching properties to tackle the most time-consuming algorithms, either for localization, perception of
the environment, or navigation purposes.
3.2 Software
The sofware comprises the operating system and modules developed to run on it. A high-level
description of the goal of each modules and their interconnections is provided since its implementation
and technical description is provided on the next chapter, Chapter 4.
3.2.1 Operating System
On top of the computing platform the Linux-based ROS is used. Although it only has 10 years of
existence [92], it’s one the most developed and used robotics framework nowadays. The framework’s
main points of interest are its distributed and modular design, simulation capabilities, and its open-source
community.
Its modular design allows each team member to focus on the development of its own modules with-
out having to implement the communication mechanism between those modules. The built-in visualizer,
25
ROS Visualization (RViz), and simulator, Gazebo, greatly contribute to the rapid development and de-
bugging of the implemented algorithms. These provide bootstrapping capabilities to both starters and
experienced users.
3.2.2 Software model of the robot
ROS also supports having customized 3D models of the robot. Although it has been developed
by Francisco in [6], it’s of especial relevance in this thesis given that it provides the transformation
between each sensor and the robot’s chassis, typically denominated as base link frame, and enables
the visualization of the robot in RViz. The base link frame provides a coordinate system that is fixed on
the robot. The sensors, in this case, have a fixed position in relation with the base link. The transforms
between the base link and the sensors are created using the second generation of transform libraries
of ROS, tf2 [93], and can be visualized in Fig 3.3.
(a) Visual representation of the base link ’s transform to thesensors and actuators visualized in RViz.
(b) Graph representation of themap, odom, and base link ’stransform.
Figure 3.3: tf2 transforms of the designed robot.
To indicate the absolute position of the robot in relation with the map, two additional transforms
are provided: odom and map. The odom frame provides the relative pose estimation transformation
between the base link and what the odometry sources perceive. The map transform represents the
correction of the drift of the odometry sources providing an absolute pose of the base link regarding a
26
global coordinate system, generically known as map.
3.2.3 Software modules
A high-level description of the software modules represented in Fig 3.1 is provided in this Section.
3.2.3.A Sensor calibration parameters
In order to reduce the overall uncertainty associated to the sensor’s measurements, a dedicated
module to calibrate the sensors and discover its intrinsic characteristics is created. The calibration
is to be performed before the competition, however, the calibration information is to be used during the
execution of the assigned tasks to ensure optimal accuracy of the sensors. These calibration parameters
are to be inserted directly on the microcontroller and as a configuration file to the software drivers.
3.2.3.B Ground, lines, and obstacles detection
With the sensors calibrated, a point cloud is output from the RGB-D sensor, the Kinect One. The
point cloud is, as the name implies, a set of data points spread in 3D space. From this point cloud, the
points belonging to the floor, where the robot is able to move, is segmented from the obstacles standing
on the track, where the robot is not able to move.
On this segmentation block, two laser-scans with distinct functionalities are output, namely:
Virtual Walls - By analyzing the floor’s colors using image processing techniques, the lines of the track
are identified and output as a laser scan, which can be thought as virtual walls.
Obstacles - Objects within a certain height are detected and output as a typical laser scan.
To fulfill these tasks, this module is split into three nodes: the segmentation of the RGB-D output, the
generation of virtual walls from the rack’s lines, and the detection of the obstacles.
The virtual wall output is provided to the Global Localization and Mapping module to enable the usage
of Occupancy-Grid based methods, such as MCL, while the detected obstacles are directly provided to
the navigation and decision-making modules to decide the next robot’s actions.
3.2.3.C Relative Pose Estimation
By combining all the available state estimation sources, relative or not, that are able to estimate the
vehicle’s pose, a corrected and more accurate source of odometry is provided. In the present work,
the sources available for pose estimation are the stereo camera, ZED, and the IMU, MPU-9250. The
corrected pose estimate is then provided to the Global Localization and Mapping module to carry its task
with a higher degree of certainty on the robot’s absolute pose.
27
This module also provides the dead-reckoning transform between the base link and odom frames.
3.2.3.D Global Localization and Mapping
With the information of the current pose and the virtual walls generated by the novel algorithm it is
possible to model the environment and localize the robot within the created map.
This module is divided in two nodes. The first generates a map from the observed features by
correcting the odometry path applying loop closure techniques, which consists on recognizing when a
previously visited location is revisited. The second node utilizes the generated map and the aforemen-
tioned inputs to provide an absolute localization of the robot in relation with the map.
The mapping node is to be used before the competition to store the identified features of the envi-
ronment so that it can be used during the execution of the challenges. This enables a higher scalability
of the developed system, as it is able to generate virtually any environment,
The localization node implements a multi-modal method to estimate an absolute robot pose in rela-
tion with the global map.
3.2.3.E Semaphore and Traffic Sign Recognition
Using the wide-camera image input, the surrounding environment is analyzed by this module to
identify the presence of semaphores or traffic signs. Since traffic signs can be found anywhere on the
track but the semaphores are only found near the zebra-crossing, the module is split in two smaller
blocks, one for each use case, allowing the segmentation and parallelization of tasks. This is especially
important to save computing resources as these are to be executed on demand.
28
4Implementation
Contents
4.1 Hardware Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Sensor calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Ground, lines and obstacle detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Relative Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Global Localization and Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 Semaphores and Traffic Signs Detection and Recognition . . . . . . . . . . . . . . . 48
29
Implementation
The practical implementation of each of the modules displayed in Fig 3.1 is outlined on this Chapter.
This includes the development of the hardware components that are required to power and interface
with the low-level sensors and a thorough description of the ROS packages created and used to fulfill
the system’s requirements.
4.1 Hardware Interface
As shared on Section 3.1.4, an Arduino microcontroller is used to extract the data from the low-
level sensors and command the actuators, and a circuit board is designed to organize and facilitate the
connections while supporting the smallest components. In this Section, the microcontroller interface and
the needed circuitry to interface with the hardware are detailed.
4.1.1 Power regulation and interface with small sensors and actuators
A custom PCB is designed to provide an easy interface for all sensors and actuators and to distribute
the power to all electronic components inside the robot’s chassis, with the exception of the of the cameras
which already possess a standard Universal Serial Bus (USB) connection, and the computing platform
and motor which have its dedicated power supply and driver.
The Eagle software is used for the PCB design since it’s used by a big share of the existing electronics
companies, and there’s an extensive number of freely distributed components.
As it’s depicted in the Fig. A.1 of Appendix A, the PCB has the same size and format as the mi-
crocontroller, the Arduino Mega. It was decided as such, so that it can be directly mounted over the
microcontroller, saving up space and requiring less physical cabled connections between the two.
The constituting blocks of the PCB are presented in Fig. A.2 of Appendix A. The blocks’ functions
and implementation details are described below:
• Named interfaces inscribed on the board with standard flat cable connections that cannot be re-
versed. This not only facilitates the assembly of the hardware while it also prevents those from
30
being incorrectly connected and, consequently, avoiding damage to the sensors and actuators.
• As the battery working voltage varies from 21V to 24 V and the majority of the hardware compo-
nents work at lower voltages, power regulators are used to regulate the battery input. The Texas
Instruments LM338T is used due to its capability of outputting more than the predicted current
usage, 5 A, it’s price, and simplicity.
• Support and incorporation of the smaller sensors and actuators, creating a direct connection to the
microcontroller and saving up in space and number of cabled connections. The smaller electronic
components directly incorporated in the board are: the MPU9250 and its logic level converter and
a light switch.
The MPU9250 needs a logic level converter to convert its 3.3 V logic to the 5 V logic of the
microcontroller.
The light switch is implemented with a NPN Darlington Transistor Pair, since it provides a high
current gain of the controlling signal of the microcontroller, which is limited to 50 mA, to enable the
1 A LED. Also, the switch is only enabled at the logical level ’1’. This is especially important when
the sensors are already connected to the board but the microcontroller is not yet started, i.e., the
lights would always be turned on while the microcontroller is disabled or initiating since its outputs
are at the logical level ’0’.
With the sensor and actuators connected to the microcontroller, the sensor values are extracted and
the actuators commanded using an Arduino program.
4.1.2 Bridging between the low-level hardware and the computing platform
As described in 3.1.4, an Arduino Mega is utilized to make the bridge between the lower-level sensors
and actuators and ROS.
To accomplish this, the microcontroller is always iteratively reading each sensor value and sending
it via UART to the computing platform. However, when a message is received via UART from the
computing platform, Arduino stops its current execution and applies the most recent command to the
actuators. The connections and protocols used between each hardware unit, the Arduino, and the
computing platform are detailed below:
• Inter-integrated Circuit (I2C) is the protocol used by the MPU-9250 to communicate. The protocol
requires two wires, one for the clock and other for the data, and provides a synchronous channel
of communication between multiple master and slave devices. The master devices drive the clock
signal and request data from the slave devices, while the slave devices simply respond to the
requests performed by the master. In this case, the Arduino is the only master device and requests
31
the values of the slave device’s sensors, namely, the gyroscope, accelerometer, and compass of
the MPU-9250.
• The Universal Asynchronous Receiver-Transmitter (UART) or Serial, as it’s known in the Ardudino
environment, protocol is used by the Arduino to interface with a computer. Arduino has a built-
in USB-to-Serial chip, enabling the connection to a computer via the standard USB protocol. In
this communication channel, Arduino sends the sensor values to be interpreted and receives the
actuators commands.
• Generic Input/Output (I/O) are programmable pins whose function is defined on the microcon-
troller’s initiation routine. The three different functions that these programmable can assume de-
pends on the hardware I/O too. For that reason, the hardware components using these pins are
listed:
– The range sensors are connected to an Analog pin, pin with an Analog to Digital Converter
(ADC), since these sensors provide a value that ranges from 0 V to 5 V .
– To send the velocity and angle command to the motor and servos, respectively, Pulse Width
Modulation (PWM) pins are used. PWM is a technique to achieve a simulated analog value
from digital signals. In digital signals, the signal only assumes two discretized levels, ’1’ or
’0’. This techniques simulates an analog signal by switching the logical value at very high
frequency while controlling the time that the signal is at the logical level ’1’. With this, it’s
possible to output a simulated analog value that ranges from the voltage of the logical level
’0’, which is 0 V , to the logical level ’1’, which corresponds to 5 V .
– Given the simplicity of the light switch, a simple Digital pin is necessary to turn the lights on
or off. This is accomplished by sending a signal with the logical level ’1’ or ’0’ to urn the lights
on or off, respectively.
To make the most out of ROS capabilities, the Arduino uses the rosserial [94] to received and send
ROS’ messages directly from or to a ROS node via its Serial port. These messages are standardized
and encouraged by the ROS community, hence, the standards are followed. The messages exchanged
by the microcontroller and computing platform, where ROS is installed and the nodes are executing, are:
• sensor msgs/Imu holds the values of the accelerometer and gyroscope converted from MPU-
9250’s raw values to m/s2 and rad/s, respectively, and the absolute orientation provided by the
compass expressed by a quaternion.
• sensor msgs/Range carries data of a single infrared range sensor converted to m.
• gometry msgs/Twist contains the commands provided by the Navigation and Control modules
and that are applied to the actuators.
32
4.2 Sensor calibration
To allow the correction of systematic errors and reduce the overall uncertainty associated with each
sensor measurement, each sensor is calibrated and its intrinsics are stored. Not only does calibration al-
lows the reduction of uncertainty while also enabling the conversion of the raw values to the International
Organization for Standardization (ISO) standards, which are used in the ROS messages as described
in Section 4.1.2.
The sensor’s calibration is done at two different levels. For the sensors connected to the microcon-
troller, an Android calibration program is available. The sensors directly connected to the computing
platform, namely, the cameras, have their own methods for calibration embedded on the drivers. The
calibration methods are described in detail in the following subsections.
4.2.1 Low-level sensors calibration
To accelerate the low-level sensors’ calibration an Arduino calibration program is created. It provides
two different methods to calibrate the sensors involved, namely:
• a semi-automated way to calibrate the IMU’s accelerometer, gyroscope, and magnetic compass.
It’s semi-automated since it’s necessary to excite the IMU in all axis, by moving the robot in an
eight-shaped figure. After the calibration process, the calibration values obtained are exported to
the Arduino code to compensate for the IMU’s sensors bias and scale factors.
• a semi-automated way to export the raw values of the infrared range sensors captured on different
conditions which allow for a quicker conversion of these raw values into ISO units. Since different
materials have different reflection coefficients, the calibration program accelerates the process of
collecting data as it allows the user to set the distance and the program outputs the values for the
user.
The source-code of the program can be found on the project’s GitHub repository [95].
4.2.2 Kinect sensors calibration
Regarding the camera calibration, Microsoft’s Kinect is the only sensor requiring calibration. The
wide-angle camera’s sole purpose is to provide an image to detect traffic and vertical signs which doesn’t
require knowing the camera parameters in the first place. As for the stereo camera, the ZED, it is factory
calibrated, requiring no additional attention.
Since the Kinect One possesses two different sensors that are combined to form a single RGB
image with Depth information, the RGB and Infrared (IR) sensors’ intrinsics are calculated individually
and, then, combining both. The calibration program is already present in the Kinect’s ROS driver, the
33
iai kinect2 [96], which is based on the calibration methods provided by Open Source Computer Vision
Library (OpenCV) [97]. The sensors’ intrinsics are described using the pinhole camera model. With this,
it’s possible to correct the misalignments from the RGB and IR grid, generating an output point cloud
with less error.
4.3 Ground, lines and obstacle detection
Using the Kinect’s point cloud output, the ground plane is segmented from the surroundings of the
environment. From the ground point cloud, the road lines are extracted and considered as virtual walls
to enable the usage of Occupancy-Grid based mapping and localization algorithms. For the remaining
point cloud, obstacles with height are detected passed to the navigation module.
In this section, it is explained the current implementation of the nodes that convert the Kinect’s
sensor msgs/PointCloud2 to two sensor msgs/LaserScan, being one the virtual walls generated from
the road lines and the second the detected obstacles. To accomplish this, three nodes are used, namely,
the point cloud segmentation block, the line extraction block, and, finally, the obstacle detection block.
The first two blocks are developed for the present master thesis and, for that reason, the algorithm
behind those is thoroughly described in the following sub-sections.
4.3.1 Ground plane segmentation
In line with what is described, this node is responsible for identifying the points of the Kinect’s point
cloud belonging to the ground plane, where the track is, and the remaining objects.
4.3.1.A Algorithm
The implementation of this algorithm is heavily supported on the Point Cloud Library (PCL) [98] and
is a result of the following sequence of operations:
1. Given that the Kinect’s coordinate system is not aligned with the base link coordinate frame, the
PointCloud undergoes a frame transformation first. This transformation not only keeps the coor-
dinate frames consistent and, consequently, renders the code easier to read and understand but,
most importantly, because the virtual laser scan needs to be provided in the robot’s frame.
2. After that, a Pass Through and Voxel Grid filters are applied to downsample the input cloud, using
pcl::PassThrough and pcl::VoxelGrid classes, respectively. Each filter is further described below:
• The Pass Through filter, as the name implies, removes all the points that are not within the
specified limits of the filter. Since the Kinect’s maximum range is specified at 5m, points
further than that distance are removed.
34
• The Voxel Grid filter divides the space of an input point cloud in 3D boxes, also named, voxels,
where size of the box can be defined. In each box, all points inside it are replaced by their
centroid and average color. This filter is used not only to downsample the point cloud but also
to keep a more even density distribution of the points, since the density of points drastically
decreases the further the distance.
These operations heavily reduce the number of points and, consequently, the computation com-
plexity of the tasks carried ahead.
3. To identify the ground plane, all the point planes’ normal vector coefficients are estimated and the
angle between the detected plane and the ground is calculated iteratively. The computation of the
plane’s normal vector coefficients is carried by the the pcl::SACSegmentation class.
The class uses robust estimators to compute the parameters of a mathematical model from a set
of observed data that contains outliers. To check for the presence of the ground, the algorithm will
iteratively perform the following steps:
(a) Starting from the biggest plane of the input data set of points, RANdom SAmple Consensus
(RANSAC) estimates the planes’ normal vector coefficients iteratively, removing the outliers
that are not within a defined distance threshold. With the plane’s coefficients estimated, the
angle, θ, between the detected plane’s normal vector, np, and the XY plane’s normal vector,
i.e., the ground plane’s normal vector, ng is calculated using the equation:
θ = arccos
(np · ng|np| · |ng|
), (4.1)
where |np| and |ng| represent the magnitude of the detected plane and ground plane’s normal
vectors, respectively. The magnitude of a vector, v, is calculated by resorting to the equation:
|v| = 2
√v2x + v2y + v2z , (4.2)
where vx, vy, and vz are the x, y, and z components of the vector, respectively.
(b) If the calculated angle, θ, is within a user defined threshold, the detected plane is considered
as the ground plane. Else, the computed plane is removed from the initial data set and a new
plane estimation is performed.
(c) This process repeats until a user-defined percentage of the original point cloud data is pro-
cessed. If this percentage limit is reached without the angle condition being met, it’s consid-
ered that there is no ground plane and the algorithm waits for the next point cloud to arrive.
35
4. When the ground is detected, the inliers of the ground are extracted using the pcl::ExtractIndices
class, which simply extracts the identified inlier’s indices from an input point cloud to a new point
cloud. The remaining indices from the original point cloud are considered the obstacles within the
track. These two point clouds are the output of this block, representing the detected ground and
obstacles.
The different point cloud transformations are visually represented on Fig. 4.1 using RViz.
(a) Kinect’s output point cloud. (b) Downsampled point cloud.
(c) Segmentation of the ground point cloud, repre-sented in white, and the obstacles point cloud, rep-resented in green.
Figure 4.1: Visual description and representation of the ground and obstacles segmentation algorithm.
36
4.3.1.B User-Defined Inputs
• Leaf size - The size, in meters, of the side of the cube used by the Voxel Grid filter.
• Ground detection angle - This parameter represents the angle threshold between a point cloud
detected plane and the ground plane for considering the identified plane as the ground.
• Processed cloud points - The percentage value, ranging from 0 to 1, of step 3c to consider the
original point cloud processed. If the value equals 1, the stopping condition is only met when all
points of the cloud were processed. On the hand, a value of 0 will cause the algorithm to only
consider the plane with more points and, consequently, there will be no loop.
• Estimation distance - Represents the distance threshold, in meters, used by RANSAC for a point
to be considered an inlier of a plane during the estimation.
4.3.1.C Implementation Details
The voxel grid’s implemented leaf size is 0.025 m. Since the lines have a 0.05 m thickness, the
downsampled point cloud will have two points, which is enough to detect a line. With this size, the
number of points is heavily reduced, drastically increasing the speed of the algorithm. The same size of
the leaf is used on the RANSAC’s estimation distance so that it only considers a point as an inlier if the
same point is, at most, one voxel away from the estimated plane.
The present implementation uses RANSAC as the robust estimator since it’s simple and able to
deliver with accuracy and speed. The SACSegmentation class allows other robust estimators, however,
those are based on RANSAC and are more complex. RANSAC provides the desired performance,
hence the decision to use it.
Since the robot chassis in use has a suspension, it’s possible that the cameras is subject to oscilla-
tion, thus, increasing the angle between the camera and the ground plane. To accommodate this and to
keep a safe margin, a detection angle threshold of 5 º is utilized.
4.3.2 Road lines extraction
Using the ground point cloud, a scaled image is created and processed to extract the track’s lines.
PCL does not have the necessary color processing tools to achieve this and, since the ground point
cloud is within a 2D surface, like an image, the current implementation supports on OpenCV instead.
4.3.2.A Algorithm
In order to transform from the input point cloud to a virtual laser scan, the below operations are
performed:
37
1. The minimum and maximum values of the x and y axis from the input cloud are stored to enable
the conversion from the robot’s coordinate frames to the image coordinate frames. Also, this allows
the calculation of the minimum width and height to store the image with no margins.
2. With those numbers stored, a scaled image is created, where each pixel contains the information
of a single voxel from the ground segmentation block. In fact, when the Kinect’s point cloud is
downsampled, a single color point is found on each voxel, thus, the ground point cloud is already
organized as a grid. Each pixel of the scaled image stores the RGB information of each voxel and,
although those are in different coordinate system, those be converted from the different coordinate
systems using the equation:
Iy =Px
ps−Rx,
Ix =Py
ps−Ry
(4.3)
where Ix and Iy represent the image’s coordinate frame for the x and y axis, the Rx and Ry
represent the values of the robot’s coordinate frame for the x and y axis, the Px and Py are the fur-
thermost points of the cloud on the x and y axis, respectively, and pS is a constant that represents
the size of each pixel, in meters, which is the same as the size of the voxel.
3. Then, the created image is converted from the RGB color space to CIELAB color space since it
allows the separation of the luminosity component of the image from the color components. The
advantages of using CIELAB color space is twofold:
• Since the colors have its own dedicated channel, contrary to RGB, it’s easier to create a color
filter to only detect the white color, which are the colors of the road’s lines.
• As the luminance component is in a dedicated channel, the color channels vary much less
when the lightening conditions change when compared to RGB color space. This way, the
resilience of this algorithm to different lightening conditions also increases.
By applying the color space conversion and color filter, a binary image is generated. In it, white
pixels are considered as road lines and the black pixels are considered as lanes or free space.
4. With the color filter, both continuous and non-continuous lines are captured. Since the robot can
move over non-continuous lines, those should not be detected as virtual walls. To remove those
from the binary image, all objects’ areas, in pixels, are calculated. This area is calculated using
OpenCV’s findCountours function. If the area of the object doesn’t reach a minimum threshold
defined by the user, it is removed from the image.
38
5. With only continuous lines on the image, to conclude the process, the virtual laser scan is created
by generating a number of virtual beams that start from the car’s origin, base link ’s (0, 0) point,
until those virtual beams reach a white pixel. To calculate the pixel’s x and y components on the
car’s coordinate frame, the Eq. 4.3 is used. Then, the magnitude of the beam, which corresponds
to the effective distance of the pixel to the robot, is calculated using the vector magnitude equation
already shared in Eq. 4.2 assuming that the z component is 0. The number of virtual beams
depends on the defined field of view of the sensor and angle increment that are input by the user.
If no white pixel is found, the laser beam is set to the value 0.
The described procedure’s steps can be visually observed in the Fig.4.2.
4.3.2.B User-Defined Inputs
• Angle increment - The angle interval, in degrees, between each virtual laser beam. The lower
the value, the more beams are generated for the field of view of the Kinect.
• Line minimum area - The minimum area, in m2, to consider a line continuous.
• Luminosity range - Since the only variable that can affect the color filter is the lightening condition
of the environment, as the color of the line doesn’t change, it is set as a user parameter. It
represents CIELAB’s luminosity range values that are applied on the color filter to detect the line.
4.3.2.C Implementation Details
As the luminosity threshold depends on the conditions of the environment, to visualize the effect of
the luminosity range and calibrate it, a script is created. As depicted in Fig. 4.3, the script allows the
user to change the range and visualize the output live. In the same figure, it’s also demonstrated that, to
detect the white line, the luminosity has to range values from 130 to 255.
Regarding the detection of the continuous line only, as the size of each dash of the dashed lined
is fixed, being 0.2 m by 0.05 m, the minimum area of detection can be directly calculated. Since the
resulting area is 0.01 m, the minimum area is set to 0.015 m to include a safe margin.
From the results obtained on the Chapter 5, it’s demonstrated that an angle increment of 1 º is enough
to provide the expected results in terms of accuracy.
4.3.3 Obstacle Detection
With the output of the ground segmentation block, a point cloud containing only the impassable
objects that are standing on the track is processed to output a laser scan message. With this, two
distinct objects of the competition are detected, namely, the lane obstacles, which are green boxes that
39
(a) Ground segmented point cloud visualized in Rviz. (b) Scaled image generated from ground point cloud.
(c) CIELAB color filter output image. (d) Removal of the dashed line and noise.
(e) Visual representation of the virtual beams in red. (f) Ground segmented point cloud and virtual laserscan output, in red, visualized in RViz
Figure 4.2: Visual description and representation of the generation of the virtual laser scan from an input coloredpoint cloud.
occupy the full width of a lane, and the in-construction zone’s cones. Both are output as a laser scan
message as this is the requirement imposed by the navigation and decision-making modules created on
a parallel thesis [6].
To accomplish this, the point cloud is iterated to identify objects that are within a specified height
and range from the moving robot. Although this can be easily achieved using PCL, since the point-
cloud to laserscan [99] package already implements the mentioned functionality, it is used instead.
40
(a) No filter applied (b) Forward Template
Figure 4.3: Script interface used for the calibration of the CIELAB’s channels range values.
4.3.3.A User-Defined Inputs
• Minimum and Maximum Height - Define the point’s minimum and maximum values, in meters,
for the z axis on which to search for obstacles. Points outside of this range are ignored.
• Minimum and Maximum Distance - Define the minimum and maximum radial distance on which
the points must be. Points outside of this radius are ignored.
• Angle increment - The angle interval, in radians, between each laser beam. The lower the value,
the higher the resolution of the output laser scan.
4.3.3.B Implementation Details
The obstacles’ height, according to the competition rules, range from 0.2 m to 0.3 m. Thus, the
search height is set to start from the ground, 0 m, and to end at the maximum height, 0.3 m. For the
maximum and minimum ranges, the specifications of the Kinect are applied, which correspond to 0.5 m
to 5 m.
From the experimental tests carried on Chapter 5, an angle increment of 1 º is enough to identify the
cones and obstacle tracks at the maximum range distance.
The output laser scan of the module is depicted on Fig. 4.4.
41
Figure 4.4: Visual demonstration of the output of the obstacle detection node, with red color, from the input obsta-cles point cloud, with greed color.
4.4 Relative Pose Estimation
As unveiled in the Section 3.1.3, cameras are used to estimate the pose of the vehicle instead of
wheel odometry using Visual Odometry techniques. However, both methods are based on a dead-
reckoning approach and are, thus, relative methods and subject to cumulative errors. To counteract
the errors associated to the visual odometry, besides calibration, it is combined with an IMU to provide
a more accurate and stable pose estimate. In this Section, the implementation of all modules that
contribute to the pose estimation are covered.
4.4.1 Visual Odometry
The ZED stereo camera is present on the robot frame with the main purpose of providing a reli-
able source of odometry. The creators of ZED, provide a SDK where the pose estimate is calculated.
However, the SDK is not open-source and, consequently, it’s not possible to examine the implemented
algorithm.
Nevertheless, they also provide a ROS wrapper, named zed ros wrapper, which encapsulates the
ZED SDK functions and enables its usage over ROS. The package provides a 3D pose and the pose
covariance.
4.4.1.A User-Defined inputs
The package provides a very limited set of parameters for one to tweak the obtained pose and,
consequently, its accuracy. The parameters that were confirmed to have influence on the pose accuracy
42
are:
• Resolution - The size in pixels of each camera image can be defined using this parameter. One
can opt from: 2.2K (2208x1242), 1080p (1920x1080), 720p (1280x720), or WVGA (672x376).
• Frame rate - The number of frames capture by each camera per second. Depending on the
resolution, the frame-rate varies from 15 Frames Per Second (FPS) to 100 FPS.
• Floor Alignment - When set to true, the VO algorithm will attempt to identify the ground and take
the ground into consideration on the pose estimate.
• Pose Smoothing - When set to true, the output pose is smoothed, i.e., discrete jumps of any state
variable of the pose of the vehicle with lesser intensity are ignored.
4.4.1.B Implementation details
Zed allows different combinations of resolution and frame rates depending on the application needs.
From the experimental tests carried, to obtain a good accuracy, higher frame rates are required espe-
cially at higher speeds. However, since the ZED combines image and depth information to track the 3D
pose of the camera, the higher the resolution, the more accurate is the depth information obtained and,
consequently, the pose. In order to maintain a good performance ratio between resolution and frame
rate, i.e., good accuracy both at lower and higher velocities, a resolution of 720p at 60 frames per second
is utilized.
Both Floor Alignment and Pose Smoothing parameters are enabled since those are demonstrated
to improve the output results. The Floor Alignment prevents the ZED from drifting on the z-axis, i.e.,
from estimating a upward or downward movement, since it always keeps track of the height to the
floor, drastically improving the accuracy. The Pose Smoothing prevents the pose from drifting when the
camera is almost steady, thus, improving the pose estimate when the car is stopped.
4.4.2 Sensor Fusion
To perform the fusion of all the available sensors capable of outputting a state variable, the robot localization
[100] package is used. It is used due to its numerous advantages: it allows an arbitrary number
of sensors; supports the standard messages used for this effect, namely, nav msgs/Odometry, sen-
sor msgs/Imu, and geometry msgs/PoseWithCovarianceStamped ; and provides a continuous state es-
timation even when any of the sensors doesn’t provide any information. To perform the state estimation,
an Extended Kalman Filter is used.
43
4.4.2.A User-Defined Inputs
Given that the robot localization package allows an arbitrary number of sensors, even sensors that
provide the same information, the number of inputs and individual sensor configuration has to be pro-
vided via configuration file. On the present implementation, the relevant inputs are:
• 2D mode - Although the present sensor fusion algorithm uses a 3D omnidirectional motion model,
as the name of the parameter implies, it’s possible to ignore z, roll, and pitch state variables in
order to output a 2D pose. This effectively zeroes those state variables, which may be adequate
on situations where the robot is operating in a planar environment.
• Sensor’s State Variables Configuration - The robot localization package tracks a 15-dimensional
state of the vehicle (x, y, z, roll, pitch, yaw, x, y, z, ˙roll, ˙pitch, ˙yaw, x, y, z), where each sensor can
contribute to each of these state variables. To define whether a sensor contributes to a specific
state variable, a matrix with boolean values has to be provided. By defining a boolean value with
the value true, the sensor fusion algorithm will incorporate that sensor’s readings into that specific
state estimate variable, otherwise, it won’t. The state variable configuration matrix is in the form
presented in the equation: x y zroll pitch yawx y z˙roll ˙pitch ˙yawx y z
(4.4)
• Covariance of the state varaibles - By definition, the covariance represents the relation between
two variables. In practical terms, this is used by the sensor fusion algorithm to evaluate the confi-
dence or weight of the estimate provided by the sensor to a particular state variable. The smaller
the covariance value of a state variable from a sensor, the more confidence is provided to that
sensor’s estimate on that same state variable.
• Relative - If the sensor has an absolute source of pose estimation, this parameter should be set
to false. Otherwise, if the sensor is relative, i.e., estimates the pose based on dead-reckoning
approach. When set to true, the sensor fusion algorithm interprets the first value input as a ’zero’
value.
4.4.2.B Implementation Details
In the current competition track, the track is on a planar surface, thus, there are only subtle variations
in the ground plane. Since these are of small relevance, the 2D mode is enabled, effectively zeroing the
Z, Pitch, and Roll state variables.
44
Considering that the two sensors capable of providing an estimate to the state variables are the ZED
and the MPU-9250, those were individually configured.
For the ZED, the configuration matrix is set use the x, y, and yaw values. Since it’s based on a
dead-reckoning pose estimation approach, the relative parameter is enabled.
For the IMU, the configuration vector is set to use the yaw, ˙yaw, x, y values. Since the IMU has a
compass, which provides an absolute orientation regarding Earth’s magnetic field, the relative parameter
is disabled.
The only state variable that is provided by both sensors is the yaw. As will be shared in the next
chapter, since the IMU has a higher accuracy on the yaw estimate, the covariance of the IMU was
deliberately set to a lower value than the ZED.
4.5 Global Localization and Mapping
4.5.1 Global Map
To carry the task of creating an Occupancy-Grid map to serve as base for the localization and navi-
gation modules, OpenSlam’s GMapping [101] package is implemented as it was demonstrated in [102]
to be the most accurate algorithm when compared to its competitors.
Since the environment is unknown, it uses the Rao-Blackwellized PF to solve the simultaneous and
location problem. The key idea of this PF, is that both the trajectory, x1:t, and map, m, are estimated
based on the detected environment features, named observations, z1:t, and the odometry measure-
ments, u0:t−1, by taking advantage of the factorization presented in the equation:
p(x1:t,m|z1:t, u0:t−1) = p(m|x1 : t, z1 : t).p(x1 : t|z1 : t, u0:t). (4.5)
This way, it is possible to calculate the posterior of the map and trajectory in two distinct steps,
allowing an efficient computation. In this case, each particle corresponds to a map that is built from all
observations and potential trajectories.
4.5.1.A User-Defined Inputs
For the GMapping algorithm to be as effective as possible on different scenarios, numerous param-
eters can be defined to adjust it to the present environment. The most relevant inputs for GMapping the
current implementation are:
• Map Resolution - The resolution of the occupancy grid defines, in meters, the size of each cell in
the output map. The smaller the size of each cell, the more granular is the output map.
45
• Map Size - The full length and width, in meters, of the map is defined by the difference between
the maximum and minimum values of the x and y axis.
• Particles - Since GMapping is implemented with a finite number of particles, the higher the num-
ber of particles defined by the user, the more posterior are computed and, potentially, the more
accurate is the output map.
• Odometry error - The odometry errors in translation and rotation are modeled inside GMapping
as a function of translation and rotation. Decreasing this value causes the algorithm to trust the
odometry more, computing trajectories closer to the odometric path.
• Matching score - A minimum score to consider the outcome of the scan matching valid can be
provided, avoiding jumps in the pose estimates in large open spaces, being especially importante
for limited range laser scanners with limited range, which is the case.
4.5.1.B Implementation Details
The current implementation defines that the size of each cell of the Occupancy-Grid is 0.025 m to
match the granularity of the downsampled point cloud that is provided to calculate the Kinect’s virtual
laser scan.
The provided length and width are 20 m and 10 m, respectively. As the track is 16.75 m by 6.95 m,
it is possible to fit it in there with some margin for error.
To increase the number of loop closure detections, the number of particles was increased from 30 to
100. It’s also observed that the increase in the number of particles increases the similarity between the
real and generated maps.
In the majority of the literature, where GMapping is also included, it is admitted that the odometry
error is orders of magnitude bigger than the error of the laser. However, in the present case, since the
laser scan is virtually created, this assumption is not true. For that reason, and since the odometry error
is quite small as demonstrated in Chapter 5, the expected odometry error is decreased to half while the
matching score increased to 100 to prevent incorrect matchings caused by the lack of accuracy of the
virtual laser scan.
4.5.2 Absolute Localization
In consonance with what is stated in Section 3.2.3.D, a multi-modal localization approach is required
in order to deal with the global localization problem and, to some extent, the kidnapped robot problem.
To deal with this problem, the AMCL approach of Fox et al. [20], which was already described in Section
2.1.1.C, is implemented using the amcl package [103].
46
Due to its capacity to change the number of particles depending on the approximation error or weight
of each particle, and its ease of implementation over a known map, AMCL is widely spread and approved
in the robotics industry. For all those reasons, the current localization module implements AMCL.
4.5.2.A User-Defined Inputs
The ease of implementation comes also from the well structured and described parameters. The
ones that are more relevant for the current implementation of the localization module are:
• Initial Pose and Covariance - The initial pose represents the exact place of the map where the
robots starts the localization from while the initial covariance represents, in meters, the distribution
of each state variable regarding the initial location. The higher the covariance, the more spread
are the particles on the initial state.
• Particles - Since the number of particles is adaptive, it’s possible to configure both the minimum
and maximum particles used by the algorithm.
• Odometry type - The current implementation of AMCL considers two odometry models for the
moving robots, either omnidirectional or differential.
• Odometry alphas - The odometry alphas in AMCL allow modeling the odometry error by the user,
where the expected noise of odometry’s translational and rotational components are provided in
respect to translational and rotational motion of the robot. In practice, the higher these values are,
the more spread will the particles be in relation to the odometric path during the sampling phase.
4.5.2.B Implementation Details
From the competition rules, the autonomous vehicle is always started from the same location. With
this fact, the initial robot pose in respect to the map frame is provided with a small covariance value,
since the confidence is very high that the robot is exactly on that pose.
Conforming to the previous chapters, the built robot is non-holonomic and better defined by differ-
ential constraints rather than omnidirectional motion constraints. Thus, the selected odometry type is
differential. For the same reasons described in the GMapping node implementation details, Section
4.5.1.B, the expected error for the odometry alphas is redureced by a factor of four. For that same rea-
son, since the error is always kept small, the maximum number of particles is also decreased from 5000
to 200.
47
4.6 Semaphores and Traffic Signs Detection and Recognition
Given the simplicity of the semaphore signs when compared to the traffic signs and its importance
on the competition, as those are used to start every single challenge and provide instructions during the
execution of specific tasks, a pragmatic image processing method for the detection of the semaphores is
implemented. However, due to the number and complexity of the traffic signs, a neural network is used
to recognize the existing traffic signs instead.
4.6.1 Semaphore Detection and Recognition
Based on the other state of the art approaches, the implemented method uses a color-filter to perform
a preliminary detection and, then, a template matching technique is used to confirm the presence of a
semaphore sign.
The current implementation of this module is supported on OpenCV as it provides all the available
image processing functions to identify the semaphore shapes.
4.6.1.A Algorithm
Upon the request of semaphore recognition, the module performs the following operations:
1. The input image is first converted from the RGB color space to CIELAB color space for the same
reasons described in the Section 4.3.2.A.
2. After the color space conversion, the green and yellow colors of the original image are segmented
based on a user-defined input array. The pixels within the specified color values will be transformed
to white, while the remaining ones will be transformed to black, creating a binary image.
3. With the colors segmented, only the shapes of the green and yellow signals and arbitrary objects
with the same colors remain. The identified shapes will be submitted to the OpenCV’s template
matching function, matchTemplate, to validate the presence of a known shape. This technique will
output a similarity coefficient between the different signal shapes (arrow, cross, or parking sign)
and the image shape.
4. If the similarity coefficient between the known and identified shapes output by the matchTemplate
is greater than a specified value, the template semaphore shape will be considered as detected.
5. However, for a semaphore to be considered recognized and to increase the accuracy of the algo-
rithm, it will only output the semaphore shape detected if the same semaphore shape is detected
N times consecutively, where N is a user defined input. Otherwise, if no semaphore shape is de-
tected consecutively N times within a user-defined timeout, a timeout message is thrown instead.
48
4.6.1.B User-defined inputs
• Color range values - Represent the minimum and maximum values of the three channels of
CIELAB color space in the form of an array. These are used to detect the green and yellow colors.
• Path to the templates - The path to the folder where the template shapes are stored.
• Template names - The image filenames that correspond to each distinct semaphore.
• Similarity coefficient - A percentage value, ranging from 0 to 1, for two shapes to be considered
the same. The higher the value, the more alike the two shapes need to be for a detection to be
successful.
• Consecutive detections - Integer number of consecutive detections of a shape for it to be con-
sidered as recognized.
4.6.1.C Implementation details
Since image processing techniques are generally complex and computationally heavy, this module
is implemented as a ROS Service instead of a ROS Topic. This way, the node is not continuously
checking for semaphores and, instead, it only works on demand and when necessary. Provided that
the location of the semaphores is known, the navigation and decision-making modules, developed on a
parallel thesis, request the recognition of the semaphore based on the location of the robot, i.e., when it
is near the zebra-crossing.
Although there are red, green, and yellow semaphores, the current implementation only checks for
green and yellow semaphores. Red semaphores are deliberately ignored since the robot is stopped
when checking for the semaphores, thus, informing the navigation and decision-making modules that
the semaphore is red, i.e., that the car should continue stopped, is irrelevant as it will not change the
current state of the car.
Regarding the segmentation of the green and yellow colors, the range values that are provided by
the user can be also be obtained from the calibration script that is described in Section 4.3.2.C.
With the same script, it’s also possible to create the template files that are used on the template
matching step by saving the output binary and cropping the signal shape from the binary image. The
captured templates are depicted in Fig. 4.5.
4.6.2 Traffic Sign Detection and Recognition
To carry the task of detecting the traffic signs, the neural network developed by Redmon and Divvala
[104] is utilized. It’s chosen from other machine learning methods due to its real-time applicability, as it
is able of achieving output rate of 50 Hz on their last iteration [105].
49
(a) Left Template (b) ForwardTem-plate
(c) Right Template (d) ParkingTemplate
Figure 4.5: Binary templates used by the template matching method.
Their neural network is called You Only Look Once (YOLO) due to its method of operation. Contrary
to other neural networks approaches, where a region based classifier is used to search for an object
within an image several times, the authors of YOLO solve this problem in a single pass over the image.
This is accomplished by segmenting the image in a grid where, for each cell, a bounding box for an
hypothetical object is created and a class probability is estimated. By multiplying the probability of the
class with the confidence of the bounding box having an object inside it, a weight is calculated to each
bounding box. The weights are thresholded and, when above a specified value, the bounding box and
identified class are considered as detected.
4.6.3 Implementation Details
YOLO is implemented on ROS in the darknet ros package, being the only required inputs the trained
weights of each class and the class identifiers. The classes are easily trained by creating a bounding
box over a data set of collected images containing the required classes and providing the required time
to train it. By providing these inputs, the package outputs the identified classes along with a confidence
value on that detection being correct and the bounding boxes of where the object is on the original
image.
However, given that the traffic signs are all very similar within each class, being those classes:
informative, mandatory, and danger signs, and given that the wide-angle camera has a low resolution,
to increase the success rate of the traffic sign recognition, YOLO is implemented in two layers.
The first layer detects the class of the traffic sign while the second takes the bounding box of the
detected sign to recognize only the interior part of the sign. This approach can be visualized in Fig. 4.6.
50
(a) Detection of the traffic sign class Template (b) Recognition of the interior part of the sign
Figure 4.6: Visual demonstration of the YOLO implementation.
51
5Evaluation
Contents
5.1 Performance of the modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Performance of the system on the competition environment . . . . . . . . . . . . . . 62
52
Evaluation
To grade the performance of each module and ensure that the described specifications from Table
1.1 are met, each individual module is separately assessed along with its contribution to the system
as a whole during the different competition challenges. To make this assessment, it’s evaluated in
the Autonomous Driving challenge of the Robotics Open and in a testing environment that mimics the
Robotics Open scenario. The two environments can be visualized in the Fig. 5.1.
(a) Testing environment (b) Competition environment
Figure 5.1: Comparison between the testing and the (real) competition environment.
ROS built-in functionalities, namely, its 3D visualizer tool, RViz, and its bag files that capture and
store ROS messages, are exploited both to visually assess the accuracy and to systematically test the
robot having always the same source data.
To ensure that the navigation and decision-making modules have both accurate and correct infor-
mation to plan a safe and fast route, the main characteristics on which each module can be evaluated
are:
• Accuracy of the obtained values.
• Computation cost and real-time applicability.
• Ratio between correct and incorrect operations.
53
All the output data displayed or described on this chapter is captured directly from the robot. To
recap, the robot has an Intel Core i7-7700 Quad-Core at 3.6 GHz CPU, 16 GB DDR4 Random Access
Memory (RAM), and a NVIDIA GeForce GTX 1050 Ti 4 GB GPU, on which all data was processed in.
5.1 Performance of the modules
For the evaluation of the distinct software nodes according to the determined evaluation characteris-
tics, a different environment setups are created to independently test and showcase the strengths and
weaknesses of each module. Unless specified, the input parameters of each module used on the tests
are the same values as described on Chapter 4.
5.1.1 Ground and obstacles segmentation
5.1.1.A Setup
To evaluate the performance of the ground and obstacle segmentation algorithm, it’s submitted to
two different scenarios, namely:
Scenario A - An obstacle is put on front of the vehicle at various distances to check whether there’s any
iteration where the segmentation algorithm doesn’t find the ground and, consequently, no output
is provided. The distance between the obstacle and the Kinect ranges from 70 cm to 5 m, the
latter being the maximum the specified limit by Microsoft. The specification from the navigation
and decision making modules is that neither part of the robot is ever closer than 30 cm from an
obstacle. Since the Kinect is pointed forward and the front of the vehicle is 40 cm away from the
camera, the total distance from an obstacle to the camera makes a total of 70 cm, rendering it the
minimum distance.
Scenario B - Two full laps on different directions are performed to assess if the segmentation algorithm
is affected by the motion of the robot.
On both tests, the number of iterations required by the algorithm to identify the ground plane and the
output rate of the segmented clouds is also assessed.
5.1.1.B Results
By capturing 10000 different pointclouds with distances ranging from 70 cm to 5 m, it’s possible to
conclude that the segmentation module’s algorithm is always able to identify the ground and in a single
iteration. Even on the worst case scenario, with an obstacle at 70 cm from the camera, the plane with
54
more points is always the ground. It’s defined as the worst case scenario because it’s at that distance
that the ground has less points, as it is possible to visualize on Figure 5.2.
Figure 5.2: Visual demonstration of the number of points belonging to the ground when an obstacle is at 10 cmfrom the car’s front
.
Upon the completion of two full laps, Scenario B, the 1033 different point clouds capture demon-
strated the exact same results as in the previous scenario.
Both scenarios demonstrate that the algorithm is capable of achieving an output rate of 12.4 Hz
when rounded to the bottom on the first decimal digit. Since the output rate of the Kinect’s point cloud
from iai kinect ’s package is 15 Hz, the loss is not significant. To put into perspective, some of the
cheapest real laser scan costs nearly 7 times more than the Kinect and are only capable of outputting a
laser scan at 10 Hz. The described results are summarized on Table 5.1.
Table 5.1: Ground and obstacles segmentation module performance on the different testing scenarios on 10000different point clouds for Scenario A and 1030 for Scenario B .
Scenario SuccessfulSegmentation Ratio [%]
Output Rate[Hz]
A 100 12.49
B 100 12.43
5.1.2 Road lines to virtual walls
5.1.2.A Setup
To assess the quality of the novel algorithm that generates a virtual laser scan from the identified
continuous road lines, it’s put to test on two different scenarios, namely:
55
Scenario A - Since the density of points of the ground point cloud output from the segmentation node
decreases with the increase in the distance to the Kinect, the number of points that belong to a
line also decreases. As a consequence, at a given point, the line is not detected. To identify the
distance at which a line is not detected, a line parallel to the car is observed in RViz.
Scenario B - From distances ranging between the identified maximum distance to the minimum dis-
tance, a line perpendicular to the car is continuously converted to a laser scan. All the distances
output by each beam are captured during 5000 iterations. From the acquired data it’s made an
assessment of the precision of the sensor and algorithms involved in relation to the distance.
5.1.2.B Results
The tests carried on Scenario A, can be observed on Fig. 5.3. The image demonstrates that an
observed line is only continuous until around 2.5 m. From that point onward, the density of points
decreases to a point that it’s not possible to assess if the observed points belong to a line or are simply
noise. Additionally, on the same figure, the parking space lines demonstrate that the lines are linear only
until approximately 2.5 m. For that reason, the ground segmented point cloud is only filtered up to that
point.
Figure 5.3: Visual demonstration of effect of distance in the number of points belonging to a track’s line.
By collecting the output laser scan from Scenario B, it’s possible to measure the precision of the
algorithm by calculating the standard deviation. For an accuracy test, a sensor with higher accuracy
than the sensor under consideration is required. Since one it’s not available, a precision test was carried
56
instead. The results of the test are summarized in Table 5.2.
Table 5.2: Results of the virtual walls node based on distance for 5000 iterations.
Distance [m] Mean [m] StandardDeviation [m]
1 1.007820125 0.006284148
1.5 1.493256291 0.01143513
2 1.99648467 0.015652938
2.5 2.490326246 0.019847568
These results demonstrate that, in the worst case scenario, the error is within ±2 cm approximately
68% of the time.
5.1.3 Obstacle detection
5.1.3.A Setup
For the same reasons described in the assessment of the virtual walls node, Section 5.1.2, only the
precision of the obstacle detection node is assessed. The laser scan output of the node is captured for
5000 iterations with an obstacle ranging from 1 m to 5 m. The distance of each beam is stored to assess
the precision of the sensor and algorithms involved in relation with the distance.
5.1.3.B Results
From the experimental tests, it’s observed that the node is capable of outputting at the same rate
that the segmented obstacles point cloud in input, being it 12 Hz. From the captured data, the results
obtained are summarized on Table 5.3.
Table 5.3: Results of the obstacle detection node based on distance for 5000 iterations.
Distance [m] Mean [m] StandardDeviation [m]
1 1.000007447 0.002642223
2 2.002703115 0.001730372
3 2.997153746 0.001660753
4 4.004420191 0.002361059
5 4.997393216 0.003318544
One interesting fact is that, although the used sensor and the downsampling process is the same
57
for both inputs of the virtual walls and obstacle detection nodes, the precision of the obstacle detection
node is much higher. This can be explain by the fact that the floor is almost parallel to the camera
while the obstacles are perpendicular to the camera. As demonstrated in Fig. 5.3, the density of
points decreases at a high rate with the increase in the distance. However, this doesn’t happen with
perpendicular obstacles, which can explain the differences in the precision.
Another curious fact is that the optimal distance to detect an obstacle seems to be between 2 m and
3 m, since those present the lowest standard deviation values, contrary to what would be expected, i.e.,
it would be expected that the error would increase with the distance.
For the worst case scenario, the error is within ±0.003318544 m 68% of the time and ±0.009955632
m, very close to 1 cm, 99.7 % of the time. Even though the angle of the Kinect One is fairly limited, being
70 º, since it allows the detection of obstacles up-to 5 m with only ± 1 cm of error, it can replace a real
laser scan to detect the obstacles the competition.
5.1.4 Semaphore Detection
5.1.4.A Setup
For the assessment of the ratio between successful and unsuccessful detections of semaphores, the
car is placed on the optimal position and on a hypothetical worst case position. The optimal position,
Scenario A, is the starting position, where the front of the car is directly below the screen and the camera
is directly pointing to the screen. The hypothetical worst case position, Scenario B, corresponds to a
translation of 12.5 cm in the position and with a clockwise rotation of the vehicle of 25 º, Those values
correspond to an error 5 times bigger than the specified on Table 1.1. The semaphore detection service
is called 500 times for each distinct semaphore to assess the ratio between the correct and incorrect
detections.
5.1.4.B Results
To assess the different causes of failure, no detections (service timeout) are distinguished from in-
correct detections of a given semaphore sign, i.e., when a semaphore different than the one displayed
on the screen is marked as detected. The success ratio for each scenarios and each semaphore along
with the number of correct, incorrect and timed-out calls are summarized on Tables 5.4.
The results of this test demonstrate that the template matching technique applied is able to detect
arrows with higher success rate than the ’P’ shape of the parking semaphore. This is probably due to
the complexity of the shape when compared to the arrows.
As expected from the template matching techniques, the success rate is heavily affected when the
observed shape is resized and rotated. In fact, with the optimal scenario, the results are satisfactory
58
Table 5.4: Results of the semaphore detection node for the difference scenarios.
SemaphoreSuccess ratio [%] Correct calls Incorrect calls Timeout calls
A B A B A B A B
Forward 96.2 82.0 481 410 2 49 17 41
Left 97.8 87.4 489 437 0 23 11 34
Right 97.4 93.8 487 469 0 0 13 31
Park 91.6 77.4 458 387 0 0 42 113
as the robot detects the semaphore correctly, at least, 9 out of 10 times. However, on the worst case
scenario, the detected shapes appears rotated, causing the number of incorrect calls to also increase.
Taking the Forward semaphore shape as example, since the rotation of the vehicle is performed clock-
wise on the present test, the Forward arrows is also rotated clockwise, making it more similar to the
Right semaphore. The same happens with the Left semaphore, which explains the number of incorrect
calls to both semaphores. The Right semaphore, however, even though it was rotated, it’s still fairly
similar to the original shape. If a counter-clockwise test would be performed, it would be expected that
the success rate of the Right semaphore would decrease and the Left semaphore would increase for
the reasons described.
5.1.5 Traffic Sign Recognition
5.1.5.A Setup
Since the traffic sign recognition needs to be performed during a challenge, those have to be de-
tected while the robot is moving. However, during the detection challenge, the velocity component is not
assessed, only the success rate of the traffic sign recognition. For that reason, to assess the success
rate of the traffic sign recognition node, in collaboration with Francisco’s navigation and decision-making
modules from [6], the car is programmed to complete 100 laps at 1 m/s having all traffic signs randomly
spread across the track. The test is repeated 5 times to change the position of the semaphores.
5.1.5.B Results
With the implementation of the two layers to detect the traffic sign shape and, then, the recognition
of the interior shapes of the traffic sign, YOLO’s machine learning algorithm is always able to recognize
all traffic sign with 100 % success rate on the test environment.
Although the data set of the competition is small when compared to the number of real traffic signs,
it’s demonstrated the capability of the machine learning algorithm. Comparing it to the image process-
59
ing technique (from the semaphore detection node), the amount of effort that takes to train the neural
network is relatively small when compared to the effort of implementing an image processing algorithm
that is able to detect the same object. Also, by training the neural network with rotated traffic signs, it’s
also possible to detect rotated objects. For this to occur with the image processing algorithm, one would
have to apply the template matching technique to the rotated objects too, rendering it a less scalable
approach.
5.1.6 Global Map
5.1.6.A Setup
To grade the quality of the map generated by the global map node, the map is visually assessed.
A qualitative analysis is carried instead of a quantitative one as there’s no available sensor capable of
measuring the exact location of the track’s lines, denying the possibility of an accuracy assessment.
5.1.6.B Results
After generating 10 maps with GMapping and visually assessing those, the map that is more similar
to the real track is the one presented in Fig. 5.4.
Figure 5.4: Global map of the track captured by GMapping on the test environment.
In this figure, it is possible to observe that the shape of the real track is very similar to the captured
map and that it even captures the irregularities of the track as, for instance, the interior part of the curve
60
on the right part of the image. The overall quality of the map is satisfactory and it highly increases
the scalability of the project, as it can be applied to virtually any track with continuous lines. The small
irregularities of the generated map are: artifacts near the lines where it seems that a loop closure
occurred and readjusted the observed lines, however, although those lines overlap, those are not totally
coincident; the parallel park seems to be a bit curved although it’s linear on the test environment; the
zebra crossing displays a high amount of noise, even though it captured the essence of the zebra shape.
5.1.7 Global Localization
5.1.7.A Setup
Since the test environment is not suited with an external system capable of providing a ground truth
for the robot, i.e., a more accurate position than the one provided by the robot, to make the assessment
of the accuracy of the global localization node, the robot is commanded to complete 50 full laps where
the starting and ending positions are the same. With this, the end pose provides the absolute error of
the global localization at the end of a challenge. This test is very relevant since, by the rules defined
for the competition, the robot needs to start and stop on the zebra line, which has a 5 cm thickness,
which is also according to our position accuracy specification of ± 2.5 cm. The same test is performed
at different velocities to evaluate the effects of it on the global localization.
5.1.7.B Results
From the test carried, it’s determined that the final localization error increases exponentially with
the increase of velocity. These results are demonstrated on Table 5.5, where the standard deviation is
related with the velocity of the vehicle.
Table 5.5: Results of the obstacle detection node based on distance for 5000 iterations.
Velocity [m/s]PositionStandardDeviation [m]
OrientationStandardDeviation [rad]
0.5 0.02230471 0.010995574
1.0 0.02636126 0.024260076
1.5 0.04606525 0.036600291
2.0 0.09157002 0.052780230
Given that the car travels 36 m for each full lap, knowing that the car’s position error is approximately
± 10 cm for 68 % of the time on the worst case scenario, it is demonstrated that the global localization
61
node is able to correct the majority of the odometry drift and provide an accurate absolute position
estimate to the navigation and decision-making modules.
Even though the specified localization accuracy of ± 2.5 cm is only met for a continuous velocity of
0.5 m/s, the orientation error is kept within ± 3 º for 68% of the time on the worst case scenario, which
is within the established specifications
To compare the accuracy of the global localization with the relative pose estimation node, the path
output of both is depicted in Fig. 5.5 from one of the tests at 1 m/s.
Figure 5.5: Visual demonstration of path provided by the global localization, represented with the green color, andrelative pose estimation, represented with the red color.
In the figure it’s noticeable the lack of accuracy of the relative pose estimation node and the drift
accumulated over the course of two full laps, justifying the inclusion of the global localization node to
correct the overall drift and increasing the accuracy to the specified values.
5.2 Performance of the system on the competition environment
To assess the performance of the whole system, it’s put to test on the Robotics Open 2018’s Au-
tonomous Driving challenge. The objective of the competition is to demonstrate a proof-of-concept
where the system is evaluated on a real scenario and by a third-party entity. Based on the performance
of the system, the modules are iterated and improved having into consideration the observed behavior
and penalties.
62
5.2.1 Robotics Open Environment
On April 2018, in the Portuguese Robotics Open, the designed system was put to its initial test. The
majority of the challenges were successfully completed and the designed system was able to achieve
the 2nd place, very close to the 1st position in terms of score, as demonstrated in Appendix B.
This proof-of-concept validated the success of the current approach. Contrary to the other teams’
approaches, where the robot typically follows a line until a specific condition is met, for instance, the
dashed line is no longer visible or the zebra crossing is present, the present system is fully aware of its
absolute localization within the track along with the absolute position of the obstacles within the track,
allowing it to plan the best trajectory ahead of time instead of reacting to the presence of the mentioned
specific conditions of obstacles.
The system is designed in a such a way that, even if the environment changes, the robot is able
to generate a new map of the environment, provided that it contains white lines, and a new optimal
trajectory can be given to the robot. This makes the autonomous vehicle system more scalable than a
simple line-follower approach.
During the competition, the car’s target speed was set to 1 m/s as the system was still under de-
velopment. The nodes were not fully optimized and, under some circumstances, the robot’s absolute
position would perform jumps of approximately half of a meter. This heavily affected the navigation and
decision-making modules and, for that reason, the target speed was set to a lower value than speci-
fied. On the first iteration of the system, it did not include the traffic signs recognition node, thus, the
mentioned challenge was not completed.
Additionally, respecting the modules of the present thesis, the Kinect One was heavily affected by
the environment’s infra-red emissions, where a number of random points would be visible in the Kinect’s
point cloud, even though those were not visible to the naked eye. This not only affected the performance
of the Kinect, as it invalidated the use of the sensor as an obstacle detector, since the provided data was
unreliable. In order to mitigate this problem, a real laser scan had to be added to the robot during the
competition. On the testing environment, the aforementioned problem is also visible, however, at a very
small fraction of what happened in the FNR environment.
To conclude, as also demonstrated on the nodes’ evaluation sections and in Francisco’s thesis [6],
the final iteration of the robot is able to successfully execute all the challenges within the defined spec-
ifications with the exception of the target speed. Even though it’s able to navigate at that speed, given
the length of the track, the localization error and control instability, the car-like robot commits a large
number of penalties. Since these heavily affect the scores, lower speeds are targeted. Nevertheless,
the speed of 1 m/s almost doubles all of the other team’s speeds. This demonstrates that even though
the specified goals are ambitious the objectives have been met.
63
6Conclusions and Future Work
Contents
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
64
Conclusions and Future Work
6.1 Conclusions
This document presents a proposal of a system with the objective of competing in the Autonomous
Driving challenge of the Portuguese Robotics Open. It focuses on the necessary hardware and elec-
tronics that serve as basis to the designed autonomous driving robotics platform and on the necessary
modules to provide an absolute localization within a known map and to perceive the surrounding objects
of the environment, being those obstacles, semaphores, traffic signs, or construction cones.
To address this problem, the state of the art sensors and methods used in the literature are reviewed
and implemented. The proposed system takes advantage of the state of the art methods to assess its
absolute position within a map and to plan the optimal trajectory, presenting an uncommon and unique
solution in the event’s history.
This thesis outlines all the necessary steps to make the autonomous driving vehicle a reality. It starts
on the design of the basic electronics components and circuitry to power and support the low-levels
sensors and actuators, goes through the creation of an hardware interface that bridges between the
hardware and the Robotics Operating System, and finalizes with the implementation and evaluation of
the proposed software nodes.
To address the problem of lack of features of the environment to apply a global localization algorithm,
a novel technique to generate virtual walls based on the track’s lines is developed, enabling Occupancy-
Grid based localization algorithms like MCL. With these virtual walls, it’s possible to generate virtually
any map that is filled with continuous lines and localize the robot in it. This approach provides a scal-
able solution should the map of the environment change in the subsequent years. Additionally, using
the same sensor, the Kinect One, it’s also presented an approach to detect the navigable path and
the obstacles standing in it and two distinct methods to recognize the different signs present on the
environment, namely, the semaphores and the traffic signs.
The developed system is able to localize the robot, for a constant velocity of 1 m/s with an expected
error of ± 0.0263 m in the position and ± 1.375 º in the orientation. The robot is able to meet the
specified ± 0.025 m at lower velocities, however, during the challenges, the robot is able to finalize
65
within the expected stopping area of 0.05 m most of the times. Given that the speed is a big factor on
the score and given that the expected error at 1 m/s is very close to the specified limit, this is set as the
target velocity. Even though the robot is able to reach the defined target speed of 2 m/s, this ambitious
velocity is almost impracticable due to the size of the track, the localization error at this velocity, and the
instability of the control at this pace.
Regarding the detection of obstacles, those are detected within a range of 0.5 m to 5 m with an
expected error of approximately 1 cm for 99.73 % of the time, which is five times less than specified, and
at a rate of 12 Hz, which is more than double of the specified rate.
The pragmatic solution presented for the detection of the semaphores is capable of fulfilling the
objectives of the competition and is demonstrated to be a viable solution when the data set is small and
when the control of the car is optimal, i.e., if the car is able to always directly point to the semaphores.
The neural network approach for the traffic signs is capable of achieving a 100 % success rate and
could also be used for the semaphores to increase the success rate when the semaphores are rotated,
i.e., when the car is not directly pointing to the semaphores.
The first iteration of the designed system was showcased at Robotics Open competition, held in April
2018, achieving the 2nd place, very close in terms of score to the 1st position, confirming the success of
the concept and its advantages. The final iteration of the autonomous driving system, presents a more
stable solution which is demonstrated to be able to address all of the competition challenges.
6.2 Future work
Given that an academic robotics platform for autonomous driving is built, it can serve as a basis for
a forthcoming work. Below, it’s presented a list of the possible improvements on the presented work:
• Implementation of the ground segmentation and virtual walls nodes over GPU. Although the cur-
rent implementation is able to achieve a rate of 12 Hz, it’s only achieved by downsampling the
Kinect’s point cloud. It’s possible that the accuracy of the algorithm increases if the voxel size of
the downsample process is reduced while maintaining a high output rate.
• Switching the implemented linear power regulators for switching regulators to decrease the overall
energy dissipation of the circuit and, consequently, the temperature inside the robot frame.
• Validating if the inclusion of a wheel odometry system to each wheel, which was not possible due
to the suspension of the used vehicle, increases the accuracy of the relative pose estimation.
• Evaluate the developed system on a different environment to assess the scalability of the devel-
oped solution.
66
Bibliography
[1] (2017) Autonomous driving competition overview. [Online]. Available: robotica2017.isr.uc.pt/
index.php/en/competitions/major/autonomous-driving
[2] [Online]. Available: https://ec.europa.eu/commission/commissioners/2014-2019/moedas/
announcements/european-conference-connected-and-automated-driving-together-shaping-future
en
[3] [Online]. Available: https://ec.europa.eu/transport/road safety/specialist/statistics en
[4] [Online]. Available: https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/812115
[5] “SAE J3016: Taxonomy and Definitions for Terms Related to On-Road Motor Vehicle Automated
Driving Systems,” Society of Automotive Engineers, Standard, Sep. 2016.
[6] F. Varandas, “Autonomous driving of competition robot - navigation and decision making,” 2018.
[7] S. Thrun, W. Burgard, and D. Fox, Probabilistic robotics. MIT press, 2005.
[8] S. Thrun, J.-S. Gutmann, D. Fox, W. Burgard, B. Kuipers et al., “Integrating topological and metric
maps for mobile robot navigation: A statistical approach,” in AAAI/IAAI, 1998, pp. 989–995.
[9] A. Saffiotti, “Handling uncertainty in control of autonomous robots,” Applications of Uncertainty
Formalisms, pp. 198–224, 1998.
[10] H. W. Sorenson and A. R. STUBBERUD §, “Non-linear filtering by approximation of the a posteriori
density,” International Journal of Control, vol. 8, no. 1, pp. 33–51, 1968.
[11] R. E. Kalman et al., “A new approach to linear filtering and prediction problems,” Journal of basic
Engineering, vol. 82, no. 1, pp. 35–45, 1960.
[12] R. Negenborn, “Robot localization and kalman filters,” Utrecht Univ., Utrecht, Netherlands, Mas-
ter’s thesis INF/SCR-0309, 2003.
67
[13] G. Bishop and G. Welch, “An introduction to the kalman filter,” Proc of SIGGRAPH, Course, vol. 8,
no. 27599-23175, p. 41, 2001.
[14] P. S. Maybeck, Stochastic models, estimation, and control. Academic press, 1982, vol. 3.
[15] M. Grewal and A. Andrew, “Kalman filtering,” in International Encyclopedia of Statistical Science.
Springer, 2001.
[16] E. A. Wan and R. Van Der Merwe, “The unscented kalman filter for nonlinear estimation,” in Adap-
tive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC.
The IEEE 2000. Ieee, 2000, pp. 153–158.
[17] S. J. Julier and J. K. Uhlmann, “A new extension of the kalman filter to nonlinear systems,” in Int.
symp. aerospace/defense sensing, simul. and controls, vol. 3, no. 26. Orlando, FL, 1997, pp.
182–193.
[18] S. Se, D. Lowe, and J. Little, “Local and global localization for mobile robots using visual land-
marks,” in Intelligent Robots and Systems, 2001. Proceedings. 2001 IEEE/RSJ International Con-
ference on, vol. 1. IEEE, 2001, pp. 414–420.
[19] D. Fox, W. Burgard, and S. Thrun, “Markov localization for mobile robots in dynamic environments,”
Journal of Artificial Intelligence Research, vol. 11, pp. 391–427, 1999.
[20] D. Fox, W. Burgard, F. Dellaert, and S. Thrun, “Monte carlo localization: Efficient position estima-
tion for mobile robots,” AAAI/IAAI, vol. 1999, no. 343-349, pp. 2–2, 1999.
[21] K. Kanazawa, D. Koller, and S. Russell, “Stochastic simulation algorithms for dynamic probabilis-
tic networks,” in Proceedings of the Eleventh conference on Uncertainty in artificial intelligence.
Morgan Kaufmann Publishers Inc., 1995, pp. 346–351.
[22] F. Dellaert, D. Fox, W. Burgard, and S. Thrun, “Monte carlo localization for mobile robots,” in
Robotics and Automation, 1999. Proceedings. 1999 IEEE International Conference on, vol. 2.
IEEE, 1999, pp. 1322–1328.
[23] F. Dellaert, W. Burgard, D. Fox, and S. Thrun, “Using the condensation algorithm for robust, vision-
based mobile robot localization,” in Computer Vision and Pattern Recognition, 1999. IEEE Com-
puter Society Conference on., vol. 2. IEEE, 1999, pp. 588–594.
[24] S. Thrun, D. Fox, W. Burgard, and F. Dellaert, “Robust monte carlo localization for mobile robots,”
Artificial intelligence, vol. 128, no. 1-2, pp. 99–141, 2001.
68
[25] L. Zhang, R. Zapata, and P. Lepinay, “Self-adaptive monte carlo localization for mobile robots using
range sensors,” in Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ International
Conference on. IEEE, 2009, pp. 1541–1546.
[26] J. Borenstein and L. Feng, “Measurement and correction of systematic odometry errors in mobile
robots,” IEEE Transactions on robotics and automation, vol. 12, no. 6, pp. 869–880, 1996.
[27] C. M. Wang, “Location estimation and uncertainty analysis for mobile robots,” in Robotics and
Automation, 1988. Proceedings., 1988 IEEE International Conference on. IEEE, 1988, pp. 1231–
1235.
[28] N. L. Doh and W. K. Chung, “A systematic representation method of the odometry uncertainty of
mobile robots,” Intelligent Automation & Soft Computing, vol. 12, no. 4, pp. 397–409, 2006.
[29] G. Klein and D. Murray, “Parallel tracking and mapping for small ar workspaces,” in Mixed and
Augmented Reality, 2007. ISMAR 2007. 6th IEEE and ACM International Symposium on. IEEE,
2007, pp. 225–234.
[30] D. Nister, O. Naroditsky, and J. Bergen, “Visual odometry for ground vehicle applications,” Journal
of Field Robotics, vol. 23, no. 1, pp. 3–20, 2006.
[31] F. Steinbrucker, J. Sturm, and D. Cremers, “Real-time visual odometry from dense rgb-d im-
ages,” in Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference
on. IEEE, 2011, pp. 719–722.
[32] A. Surrecio, U. Nunes, and R. Araujo, “Fusion of odometry with magnetic sensors using kalman
filters and augmented system models for mobile robot navigation,” in Proceedings of the IEEE
International Conference on Industrial Electronics, Dubrovnik, Croacia, 2005.
[33] V. A. Ajay, A. P. Suherlan, G. S. Soh, S. Foong, K. Wood, and K. Otto, “Localication and trajectory
tracking of an autonomous spherical rolling robot using imu and odometry,” in Proceedings of
the ASME 2015 International Design Engineering Technical Conferences and Computers and
Information in Engineering Conference, Boston, MA, USA, 2015, pp. 2–5.
[34] H. Chung, L. Ojeda, and J. Borenstein, “Accurate mobile robot dead-reckoning with a precision-
calibrated fiber-optic gyroscope,” IEEE Transactions on Robotics and Automation, vol. 17, no. 1,
pp. 80–84, 2001.
[35] M. Marron, J. C. Garcia, M. Sotelo, E. Lopez, and M. Mazo, “Fusing odometric and vision data with
an ekf to estimate the absolute position of an autonomous mobile robot,” in Emerging Technologies
and Factory Automation, 2003. Proceedings. ETFA’03. IEEE Conference, vol. 1. IEEE, 2003, pp.
591–596.
69
[36] S. B. Goldberg and L. Matthies, “Stereo and imu assisted visual odometry on an omap3530 for
small robots,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Com-
puter Society Conference on. IEEE, 2011, pp. 169–176.
[37] K. Berntorp, K.-E. Arzen, and A. Robertsson, “Sensor fusion for motion estimation of mobile robots
with compensation for out-of-sequence measurements,” in Control, Automation and Systems (IC-
CAS), 2011 11th International Conference on. IEEE, 2011, pp. 211–216.
[38] S. Thrun et al., “Robotic mapping: A survey,” Exploring artificial intelligence in the new millennium,
vol. 1, pp. 1–35, 2002.
[39] M. J. Mataric, “A distributed model for mobile robot environment-learning and navigation,” 1990.
[40] B. Kuipers and Y.-T. Byun, “A robot exploration and mapping strategy based on a semantic hi-
erarchy of spatial representations,” Robotics and autonomous systems, vol. 8, no. 1, pp. 47–63,
1991.
[41] J. Andrade-Cetto and A. Sanfeliu, “Topological map learning for a mobile robot in indoor envi-
ronments,” in Proceedings of the 9th Spanish Symposium on Pattern Recognition and Image
Analysis, vol. 1, 2001, pp. 221–226.
[42] S. Panzieri, F. Pascucci, R. Setola, and G. Ulivi, “A low cost vision based localization system for
mobile robots,” target, vol. 4, p. 5, 2001.
[43] T. Bailey, “Mobile robot localisation and mapping in extensive outdoor environments,” Ph.D. dis-
sertation, Univ. Sydney, Australian Ctr. Field Robotics, 2002.
[44] O. Pink, “Visual map matching and localization using a global feature map,” in Computer Vision
and Pattern Recognition Workshops, 2008. CVPRW’08. IEEE Computer Society Conference on.
IEEE, 2008, pp. 1–7.
[45] H. Kim, D. Lee, T. Oh, H.-T. Choi, and H. Myung, “A probabilistic feature map-based localization
system using a monocular camera,” Sensors, vol. 15, no. 9, pp. 21 636–21 659, 2015.
[46] A. Elfes, “Sonar-based real-world mapping and navigation,” IEEE Journal on Robotics and Au-
tomation, vol. 3, no. 3, pp. 249–265, 1987.
[47] J. Borenstein and Y. Koren, “The vector field histogram-fast obstacle avoidance for mobile robots,”
IEEE transactions on robotics and automation, vol. 7, no. 3, pp. 278–288, 1991.
[48] A. Arleo, J. d. R. Millan, and D. Floreano, “Efficient learning of variable-resolution cognitive maps
for autonomous indoor navigation,” IEEE Transactions on Robotics and Automation, vol. 15, no. 6,
pp. 990–1000, 1999.
70
[49] G. K. Kraetzschmar, G. P. Gassull, K. Uhl, G. Pags, and G. K. Uhl, “Probabilistic quadtrees for
variable-resolution mapping of large environments,” in Proceedings of the 5th IFAC/EURON sym-
posium on intelligent autonomous vehicles. July, 2004.
[50] K. M. Wurm, A. Hornung, M. Bennewitz, C. Stachniss, and W. Burgard, “Octomap: A probabilis-
tic, flexible, and compact 3d map representation for robotic systems,” in Proc. of the ICRA 2010
workshop on best practice in 3D perception and modeling for mobile manipulation, vol. 2, 2010.
[51] G. Weiß and E. v. Puttkamer, “A map based on laserscans without geometric interpretation,” in
Intelligent Autonomous Systems, vol. 4, no. 2. IOS Press, 1995, pp. 403–407.
[52] F. Lu and E. E. Milios, “Optimal global pose estimation for consistent sensor data registration,”
in Robotics and Automation, 1995. Proceedings., 1995 IEEE International Conference on, vol. 1.
IEEE, 1995, pp. 93–100.
[53] J. L. Crowley, F. Wallner, and B. Schiele, “Position estimation using principal components of range
data,” Robotics and Autonomous Systems, vol. 23, no. 4, pp. 267–276, 1998.
[54] R. Sim and G. Dudek, “Learning visual landmarks for pose estimation,” in Robotics and Automa-
tion, 1999. Proceedings. 1999 IEEE International Conference on, vol. 3. IEEE, 1999, pp. 1972–
1978.
[55] I. Ulrich and I. Nourbakhsh, “Appearance-based place recognition for topological localization,” in
Robotics and Automation, 2000. Proceedings. ICRA’00. IEEE International Conference on, vol. 2.
Ieee, 2000, pp. 1023–1029.
[56] S. Segvic, A. Remazeilles, A. Diosi, and F. Chaumette, “A mapping and localization framework
for scalable appearance-based navigation,” Computer Vision and Image Understanding, vol. 113,
no. 2, pp. 172–187, 2009.
[57] D. F. Wolf and G. S. Sukhatme, “Towards mapping dynamic environments,” in In Proceedings of
the International Conference on Advanced Robotics (ICAR), 2003, pp. 594–600.
[58] D. Wolf and G. S. Sukhatme, “Online simultaneous localization and mapping in dynamic envi-
ronments,” in Robotics and Automation, 2004. Proceedings. ICRA’04. 2004 IEEE International
Conference on, vol. 2. IEEE, 2004, pp. 1301–1307.
[59] C. Hane, T. Sattler, and M. Pollefeys, “Obstacle detection for self-driving cars using only monoc-
ular cameras and wheel odometry,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ
International Conference on. IEEE, 2015, pp. 5101–5108.
71
[60] P. Fleischmann and K. Berns, “A stereo vision based obstacle detection system for agricultural
applications,” in Field and Service Robotics. Springer, 2016, pp. 217–231.
[61] D. Holz, S. Holzer, R. B. Rusu, and S. Behnke, “Real-time plane segmentation using rgb-d cam-
eras,” in Robot Soccer World Cup. Springer, 2011, pp. 306–317.
[62] B. Peasley and S. Birchfield, “Real-time obstacle detection and avoidance in the presence of
specular surfaces using an active 3d sensor,” in Robot Vision (WORV), 2013 IEEE Workshop on.
IEEE, 2013, pp. 197–202.
[63] A. De La Escalera, L. E. Moreno, M. A. Salichs, and J. M. Armingol, “Road traffic sign detection
and classification,” IEEE transactions on industrial electronics, vol. 44, no. 6, pp. 848–859, 1997.
[64] W. Shadeed, D. I. Abu-Al-Nadi, and M. J. Mismar, “Road traffic sign detection in color images,”
in Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003 10th IEEE
International Conference on, vol. 2. IEEE, 2003, pp. 890–893.
[65] N. Yabuki, Y. Matsuda, Y. Fukui, and S. Miki, “Region detection using color similarity,” in Circuits
and Systems, 1999. ISCAS’99. Proceedings of the 1999 IEEE International Symposium on, vol. 4.
IEEE, 1999, pp. 98–101.
[66] C.-Y. Fang, C.-S. Fuh, S.-W. Chen, and P.-S. Yen, “A road sign recognition system based on
dynamic visual model,” in Computer Vision and Pattern Recognition, 2003. Proceedings. 2003
IEEE Computer Society Conference on, vol. 1. IEEE, 2003, pp. I–I.
[67] M.-Y. Fu and Y.-S. Huang, “A survey of traffic sign recognition,” in Wavelet Analysis and Pattern
Recognition (ICWAPR), 2010 International Conference on. IEEE, 2010, pp. 119–124.
[68] X. Gao, K. Hong, P. Passmore, L. Podladchikova, and D. Shaposhnikov, “Colour vision model-
based approach for segmentation of traffic signs,” Journal on Image and Video Processing, vol.
2008, p. 6, 2008.
[69] A. L. Yuille, D. Snow, and M. Nitzberg, “Signfinder: Using color to detect, localize and identify
informational signs,” in Computer Vision, 1998. Sixth International Conference on. IEEE, 1998,
pp. 628–633.
[70] N. Kehtarnavaz, N. C. Griswold, and D. Kang, “Stop-sign recognition based on color/shape pro-
cessing,” Machine Vision and Applications, vol. 6, no. 4, pp. 206–208, 1993.
[71] H. Fleyeh, “Traffic and road sign recognition,” Ph.D. dissertation, 2008.
[72] C. Schiekel, “A fast raffic sign recognition algorithm for gray value images,” in Computer Analysis
of Images and Patterns. Springer, 1999, pp. 836–836.
72
[73] Y. Aoyagi and T. Asakura, “A study on traffic sign recognition in scene image using genetic algo-
rithms and neural networks,” in Industrial Electronics, Control, and Instrumentation, 1996., Pro-
ceedings of the 1996 IEEE IECON 22nd International Conference on, vol. 3. IEEE, 1996, pp.
1838–1843.
[74] G. Piccioli, E. De Micheli, P. Parodi, and M. Campani, “Robust road sign detection and recognition
from image sequences,” in Intelligent Vehicles’ 94 Symposium, Proceedings of the. IEEE, 1994,
pp. 278–283.
[75] N. Barnes and A. Zelinsky, “Real-time radial symmetry for speed sign detection,” in Intelligent
Vehicles Symposium, 2004 IEEE. IEEE, 2004, pp. 566–571.
[76] G. Loy and N. Barnes, “Fast shape-based road sign detection for a driver assistance system,” in
Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ International
Conference on, vol. 1. IEEE, 2004, pp. 70–75.
[77] E. Perez and B. Javidi, “Scale and illumination-invariant road sign detection,” in Lasers and Electro-
Optics Society 2000 Annual Meeting. LEOS 2000. 13th Annual Meeting. IEEE, vol. 2. IEEE, 2000,
pp. 748–749.
[78] M. Nakamura, S. Kodama, T. Jimbo, and M. Umeno, “Searching and recognition of road signpost
using ring detection network,” in Systems, Man, and Cybernetics, 1999. IEEE SMC’99 Conference
Proceedings. 1999 IEEE International Conference on, vol. 2. IEEE, 1999, pp. 190–195.
[79] G. Piccioli, E. De Micheli, P. Parodi, and M. Campani, “Robust method for road sign detection and
recognition,” Image and Vision Computing, vol. 14, no. 3, pp. 209–223, 1996.
[80] S. Azami, S. Katahara, and M. Aoki, “Route guidance sign identification using 2-d structural de-
scription,” in Intelligent Vehicles Symposium, 1996., Proceedings of the 1996 IEEE. IEEE, 1996,
pp. 153–158.
[81] Y. B. Lauziere, D. Gingras, and F. P. Ferrie, “A model-based road sign identification system,” in
Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE
Computer Society Conference on, vol. 1. IEEE, 2001, pp. I–I.
[82] H. Ohara, I. Nishikawa, S. Miki, and N. Yabuki, “Detection and recognition of road signs using
simple layered neural networks,” in Neural Information Processing, 2002. ICONIP’02. Proceedings
of the 9th International Conference on, vol. 2. IEEE, 2002, pp. 626–630.
[83] J. Miura, T. Kanda, S. Nakatani, and Y. Shirai, “An active vision system for on-line traffic sign
recognition,” IEICE TRANSACTIONS on Information and Systems, vol. 85, no. 11, pp. 1784–1792,
2002.
73
[84] C.-Y. Fang, C.-S. Fuh, P. Yen, S. Cherng, and S.-W. Chen, “An automatic road sign recognition
system based on a computational model of human recognition processing,” Computer vision and
Image understanding, vol. 96, no. 2, pp. 237–268, 2004.
[85] A. Shustanov and P. Yakimov, “Cnn design for real-time traffic sign recognition,” Procedia Engi-
neering, vol. 201, pp. 718–725, 2017.
[86] S. Lafuente-Arroyo, P. Gil-Jimenez, R. Maldonado-Bascon, F. Lopez-Ferreras, and S. Maldonado-
Bascon, “Traffic sign shape classification evaluation i: Svm using distance to borders,” in Intelligent
Vehicles Symposium, 2005. Proceedings. IEEE. IEEE, 2005, pp. 557–562.
[87] M. Shi, H. Wu, and H. Fleyeh, “Support vector machines for traffic signs recognition,” in Neural
Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE Inter-
national Joint Conference on. IEEE, 2008, pp. 3820–3827.
[88] P. Gil-Jimenez, H. Gomez-Moreno, P. Siegmann, S. Lafuente-Arroyo, and S. Maldonado-Bascon,
“Traffic sign shape classification based on support vector machines and the fft of the signature of
blobs,” in Intelligent Vehicles Symposium, 2007 IEEE. IEEE, 2007, pp. 375–380.
[89] Y.-Y. Nguwi and A. Z. Kouzani, “Detection and classification of road signs in natural environments,”
Neural computing and applications, vol. 17, no. 3, pp. 265–289, 2008.
[90] J. Greenhalgh and M. Mirmehdi, “Real-time detection and recognition of road traffic signs,” IEEE
Transactions on Intelligent Transportation Systems, vol. 13, no. 4, pp. 1498–1506, 2012.
[91] P. Paclık, J. Novovicova, and R. P. Duin, “Building road-sign classifiers using a trainable similarity
measure,” IEEE Transactions on Intelligent Transportation Systems, vol. 7, no. 3, pp. 309–321,
2006.
[92] Robotics operating system history. [Online]. Available: http://www.ros.org/history/
[93] [Online]. Available: http://wiki.ros.org/tf2
[94] [Online]. Available: http://wiki.ros.org/rosserial arduino
[95] [Online]. Available: https://github.com/varandaas/xuning
[96] T. Wiedemeyer, “IAI Kinect2,” https://github.com/code-iai/iai kinect2, Institute for Artificial Intelli-
gence, University Bremen, 2014 – 2015, accessed June 12, 2015.
[97] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
[98] R. B. Rusu and S. Cousins, “3D is here: Point Cloud Library (PCL),” in IEEE International Confer-
ence on Robotics and Automation (ICRA), Shanghai, China, May 9-13 2011.
74
[99] [Online]. Available: http://wiki.ros.org/pointcloud to laserscan
[100] T. Moore and D. Stouch, “A generalized extended kalman filter implementation for the robot op-
erating system,” in Proceedings of the 13th International Conference on Intelligent Autonomous
Systems (IAS-13). Springer, July 2014.
[101] G. Grisetti, C. Stachniss, and W. Burgard, “Improved techniques for grid mapping with rao-
blackwellized particle filters,” IEEE transactions on Robotics, vol. 23, no. 1, pp. 34–46, 2007.
[102] T. Coroiu and O. Hinton, “A platform for indoor localisation, mapping, and data collection using an
autonomous vehicle,” Master’s Theses in Mathematical Sciences, 2017.
[103] [Online]. Available: http://wiki.ros.org/amcl
[104] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object
detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 779–788.
[105] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv:1804.02767, 2018.
75
APrinted Circuit Board
76
Figu
reA
.1:
Circ
uitb
oard
77
Figu
reA
.2:
Circ
uits
chem
atic
s
78
BCompetition results
79
scores
Page 1
TEAM SCORED1 D2 D3 D4 P1 P2 B1 B2 V1
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
ROTA-2018 UA/IEETA 779 146 209 266 245 28 28 60 60 60 60
Conde FEUP 963 187 195 206 22 37 60 0 60 60 60 150 285 115
N3E Xuning ISR 942 141 231 40 0 35 255 0 91 228 0 0 108 50 60 0 50 60
VAI Sozinho IPLeiria 508 0 108 272 176 62 0 60
D1
Page 2
D1 refROTA-2018 Conde N3E Xuning VAI Sozinho
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
half laps 50 4 3 4 4 4 3 4 1 0 2 4
time 30 174 31 54 133 115 59 79 20 22 48
Early departure 15
Going outside of the track (partial) 10 2 1 1
Going outside of the track (total) 20 1 1 1 1
Stopping outside the stopping area 10 1
Small collision with track accessories 10
Collision with track accessories altering geometry 20
SCORE 146 209 266 187 195 141 231 40 0 108 272
M1 M2 M2 M1 M3 M1 M2 M4 M1 M2 M3
D2
Page 3
D2 refROTA-2018 Conde N3E Xuning VAI Sozinho
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
half laps 50 4 4 0 1 4 4 2 0
time 35 70 134 0 30 75 54 38 0
Early departure 15
Going outside of the track (partial) 10
Going outside of the track (total) 20 1 1 1
Stopping outside the stopping area 10 1
Wrong direction at the signaling panels 25 1 2 2
Non stopping at the stop sign 60 1
Early departure at the stop sign 25
Small collision with track accessories 10
Collision with track accessories altering geometry 20
SCORE 245 206 0 35 255 176 62 0
M4 M2 M2 M2 M4 M4 M4 M4
D3
Page 4
D3 refROTA-2018 Conde N3E Xuning VAI Sozinho
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
half laps 50 1 1 1 1 1 2 4
time 40 17 17 33 33 28 44 112
Early departure 15 1
Going outside of the track (partial) 10 1 1
Going outside of the track (total) 20 1 1 1 1 1
Stopping outside the stopping area 10 1
Wrong direction at the signaling panels 25 1 1 2 1
Non stopping at the stop sign 60
Early departure at the stop sign 25
Small collision with track accessories 10 1
Collision with track accessories altering geometry 20 1
SCORE 28 28 22 37 0 91 228
M4 M4 M4 M4 M3 M3 M3
D4
Page 5
D4 refROTA-2018 Conde N3E Xuning VAI Sozinho
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
half laps 50 0 0 2
time 45 0 0 52
Early departure 15
Going outside of the track (partial) 10 2
Going outside of the track (total) 20
Stopping outside the stopping area 10 1
Wrong direction at the signaling panels 25
Non stopping at the stop sign 60
Early departure at the stop sign 25
Small collision with track accessories 10
Collision with track accessories altering geometry 20
SCORE 0 0 108
M4 M4 M4
V1
Page 6
V1 refROTA-2018 Conde N3E Xuning VAI Sozinho
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
sign type 25 3 6 3
sign number 25 3 6 3
Early departure 15 1 1
Going outside of the track (partial) 10
Going outside of the track (total) 20 1
Stopping outside the stopping area 10
Small collision with track accessories 10
Collision with track accessories altering geometry 20
SCORE 150 285 115
M3 M4 M4
P1
Page 7
P1 refROTA-2018 Conde N3E Xuning VAI Sozinho
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
parking manoeuvre 60 1 1 1 1 1
Early parking manoeuvre 10 1
Going outside of the track (partial) 10
Going outside of the track (total) 20
Irregular parking manoeuvre (out of bounds) 10
Irregular parking manoeuvre (out of alignment) 10
Small collision with track accessories 10
Collision with track accessories altering geometry 20
SCORE 60 60 50 60 60
M2 M1 M1 M1 M2
P2
Page 8
P2 refROTA-2018 Conde N3E Xuning VAI Sozinho
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
parking manoeuvre 60 1 0 1
Early parking manoeuvre 10
Going outside of the track (partial) 10 1
Going outside of the track (total) 20
Irregular parking manoeuvre (out of bounds) 10
Irregular parking manoeuvre (out of alignment) 10
Small collision with track accessories 10
Collision with track accessories altering geometry 20
SCORE 60 0 60
M2 M2 M3
B1
Page 9
B1 refROTA-2018 Conde N3E Xuning VAI Sozinho
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
parking manoeuvre 60 1 1 0
Early parking manoeuvre 10
Irregular parking manoeuvre (out of bounds) 10
Small collision with track accessories 10
Collision with track accessories altering geometry 20
SCORE 60 60 0
M3 M2 M2
B2
Page 10
B2 refROTA-2018 Conde N3E Xuning VAI Sozinho
T1 T2 T3 T1 T2 T3 T1 T2 T3 T1 T2 T3
parking manoeuvre 60 1 1 1 1
Early parking manoeuvre 10 1
Irregular parking manoeuvre (out of bounds) 10
Small collision with track accessories 10
Collision with track accessories altering geometry 20
SCORE 60 60 50 60
M4 M2 M3 M4
85