gpu implementation of a road sign detector ... - unipr.it
TRANSCRIPT
RESEARCH PAPER
GPU implementation of a road sign detector based on particleswarm optimization
Luca Mussi • Stefano Cagnoni • Elena Cardarelli •
Fabio Daolio • Paolo Medici • Pier Paolo Porta
Received: 8 February 2010 / Revised: 9 July 2010 / Accepted: 23 September 2010 / Published online: 15 October 2010
� Springer-Verlag 2010
Abstract Road Sign Detection is a major goal of the
Advanced Driving Assistance Systems. Most published
work on this problem share the same approach by which
signs are first detected and then classified in video sequen-
ces, even if different techniques are used. While detection is
usually performed using classical computer vision tech-
niques based on color and/or shape matching, most often
classification is performed by neural networks. In this work
we present a novel modular and scalable approach to road
sign detection based on Particle Swarm Optimization, which
takes into account both shape and color to detect signs.
In our approach, in particular, the optimization of a single
fitness function allows both to detect a sign belonging to a
certain category and, at the same time, to estimate its posi-
tion with respect to the camera reference frame. To speed up
processing, the algorithm implementation exploits the par-
allel computing capabilities offered by modern graphics
cards and, in particular, by the Compute Unified Device
Architecture by nVIDIA. The effectiveness of the approach
has been assessed on both synthetic and real video sequen-
ces, which have been successfully processed at, or close to,
full frame rate.
Keywords Particle swarm optimization � Road sign
detection � GPU computing � Parallel computing
1 Introduction
Automatic traffic sign detection and classification is a very
important issue for the Advanced Driver Assistance Sys-
tems (ADAS) and road safety. It can both improve safety
and help navigation, by providing critical information the
driver could otherwise miss, limiting or compensating for
drivers’ distractions. Because of this, several road sign
detectors have been developed in the last 10 years [24].
In most industrial systems only speed limit signs are
detected, since these are considered to be the most relevant
for safety. Nevertheless, information provided by warning
signs, mandatory signs and all the remaining prohibitory
signs can also be extremely significant: ignoring the pres-
ence of such signs can lead to dangerous situations or even
accidents. Automatic road sign detection systems can be
used to both warn drivers in these situations and supply
additional environmental information to other on-board
systems such as the Automatic Cruise Control (ACC), the
Lane Departure Warning (LDW), etc.
Both gray-scale and color cameras can be used to this
purpose: in the first case, search is mainly based on shape
and can be quite demanding in terms of computation time
[10, 19]. Using a color camera, the search can be based
mainly on chromatic information: color segmentation is, in
L. Mussi � S. Cagnoni (&) � E. Cardarelli � P. Medici �P. P. Porta
Dipartimento di Ingegneria dell’Informazione, University
of Parma, Viale G. Usberti 181a, 43124 Parma, Italy
e-mail: [email protected]
L. Mussi
e-mail: [email protected]
E. Cardarelli
e-mail: [email protected]
P. Medici
e-mail: [email protected]
P. P. Porta
e-mail: [email protected]
F. Daolio
Information Systems Institute (ISI) - HEC, University
of Lausanne, Internef 135, CH-1015 Lausanne, Switzerland
e-mail: [email protected]
123
Evol. Intel. (2010) 3:155–169
DOI 10.1007/s12065-010-0043-y
general, faster than shape detection, even if it requires
additional filtering; however, images acquired by inex-
pensive color cameras can suffer from artifacts deriving
from Bayer conversion or from other problems related, for
instance, to color balance [2].
The approaches to traffic sign detection which rely on
color images are usually based on color bases different
from RGB; the HSV/HSI color space is the most frequently
used [7, 35] but other color spaces, such as CIECAM97 [9],
can be used as well. On the one hand, these spaces separate
chromatic information from lighting information, making
detection of a specified color mostly independent of light
conditions. On the other hand, the RGB [32] and YUV [31]
color spaces require no transformations, or just very simple
ones; however, they require more sophisticated segmenta-
tion algorithms, since the boundary between colors is
fuzzier. In order to make detection more robust, both color
segmentation and shape recognition can be used in coop-
eration [9].
As regards sign recognition, most methods are based on
computational intelligence techniques [6], the most fre-
quent being neural networks [8, 11] and fuzzy logic [13].
In this paper we present a novel approach to road sign
detection based on geometric transformations of basic sign
templates. Our approach projects sets of three-dimensional
points, which sample significant regions of the road sign
template to be detected and describe its shape, onto the
image plane according to a transformation which maps 3D
points in the camera reference frame onto the image; the
transformed set of points is then matched to the corre-
sponding image pixels. The likelihood of detection is
estimated using a similarity measure between color histo-
grams [34]. This procedure can actually estimate the pose
of any object based on a 3D model within any projection
system and with any general object model. One of the
advantages over other model-based approaches is that this
approach does not need any preliminary pre-processing of
the image (like, for example, color segmentation) or any
reprojection of the full three-dimensional model [34].
Another peculiar feature with respect to similar work [7],
besides the aforementioned similarity measure, is that our
method relies upon Particle Swarm Optimization (PSO) to
estimate, by a single transformation, the pose of the sign in
the 3D space at the same time as the position of the sign in
the image.
Despite being more efficient than many other meta-
heuristics, PSO is still rather demanding in terms of
computational resources; therefore, a sequential imple-
mentation of the algorithm would be too slow for real-time
applications. As for all other metaheuristics, this is espe-
cially true when the function to be optimized is itself
computationally complex. The PSO algorithm we have
used in this work has been implemented within the nVIDIA
CUDA environment [23, 25], to take advantage of the
computing power offered by the massively parallel archi-
tectures available nowadays even on cheap consumer video
cards. As will be shown, thanks also to the parallel nature
of PSO, this choice allowed the final system to manage
several swarms at the same time, each specialized in
detecting a specific class of road signs.
This paper is organized as follows: Sect. 2 briefly
introduces PSO and its parallel implementation within
CUDA; Sect. 3 addresses the problem of road sign detec-
tion, motivating our approach and offering further details
on how shape and color information is processed to com-
pute fitness. Finally, in Sect. 4, we report results obtained
on both a synthetic video sequence containing two signs
and on two real video sequences, acquired on-board a car,
for a total running time of about 30 min.
2 GPU implementation of particle swarm optimization
Particle Swarm Optimization is a simple but powerful
optimization algorithm, introduced by Kennedy and Eb-
erhart [15]. In the last decade many variants of the basic
PSO algorithm have been developed [18, 26, 29] and
successfully applied to many problems in several fields
[28], image analysis being one of the most frequent
ones. In fact, image analysis tasks can be often refor-
mulated as the optimization of an objective function,
directly derived from the physical features of the prob-
lem being solved. Beyond this, PSO can often be more
than a way to ‘tune’ the parameters of another algorithm,
but can be directly the main building block for an ori-
ginal solution. For example, [3, 5, 21, 27, 38] use PSO
to directly infer the position of an object that is sought
in the image.
2.1 PSO basics
PSO searches for the optimum of a fitness function, fol-
lowing rules inspired by the behavior of flocks of birds in
search of food. A population of particles move within the
fitness function domain (usually termed their search
space), sampling the function in the points corresponding
to their position. This means that, after each particle’s
move, the fitness computed at its new position is evaluated.
In their motion, particles preserve part of their velocity
(inertia), while undergoing two attraction forces: the first
one, called cognitive attraction, attracts a particle towards
the best position it visited so far, while the second one,
called social attraction, pulls the particle towards the best
position ever found by the whole swarm. Based on this
model, in basic PSO, the following velocity and position
update equations are computed for each particle:
156 Evol. Intel. (2010) 3:155–169
123
ViðtÞ ¼ w � Viðt � 1Þþ C1 � R1 � ½Xibðt � 1Þ � Xiðt � 1Þ�þ C2 � R2 � ½Xigb
ðt � 1Þ � Xiðt � 1Þ� ð1Þ
XiðtÞ ¼ Xiðt � 1Þ þ ViðtÞ ð2Þ
where the subscript i refers to the i-th dimension of the
search space, V is the velocity of the particle, C1, C2 are
two positive constants, w is the inertia weight, X(t) is the
particle position at time t, Xb(t - 1) is the best fitness
position visited by the particle up to time t - 1, Xgb(t - 1)
is the best fitness point ever visited by the whole swarm; R1
and R2 are two random numbers from a uniform distribu-
tion in [0, 1].
Many variants of the basic algorithm have been devel-
oped [29], some of which have focused on the algorithm
behavior when different topologies are defined for the
neighborhoods of the particles [16]. A usual variant of PSO
substitutes Xgbðt � 1Þ with Xlbðt � 1Þ, which represents the
‘local’ best position ever found by all particles within a
pre-set neighborhood of the particle under consideration.
This formulation admits, in turn, several variants,
depending on the topology of such neighborhoods. Among
others, Kennedy and coworkers evaluated different kinds
of topologies, finding that good performance could be
achieved using random and Von Neumann neighborhoods
[16]. Nevertheless, the authors also indicated that selecting
the most efficient neighborhood structure is, in general, a
problem dependent task. Since the random topology is
usually designed such that each particle communicates
with two random neighbors, most often using a simple ring
topology is adequate for the problem at hand, while
allowing for an easier implementation.
Whatever the choices of the algorithm structure, para-
meters, etc., and despite good convergence properties, PSO
is still an iterative process which, depending on problem
difficulty, may require several thousands (when not mil-
lions) of particle updates and fitness evaluations. Therefore,
designing efficient PSO implementations is a problem of
great practical relevance. This becomes even more critical,
if one considers real-time applications to dynamic envi-
ronments in which, for example, the fast convergence
properties of PSO may be used to track moving points of
interest (maxima or minima of a specific dynamically-
changing fitness function) in real time. This is the case, for
example, of computer vision applications in which PSO has
been used to track moving objects [22] or to determine
location and orientation of objects or people [12, 21].
2.2 Implementing PSO within CUDA
We implemented a standard PSO with particles organized
with the classical ring topology [23]. The rationale behind
this choice is, on the one hand, the inadequacy of PSO
with synchronous best update and global-best topology,
which would have been the most natural and easiest
parallel implementation, for optimizing multi-modal
problems [4]. On the other hand, as reported above, PSO
with ring topology provides a very good compromise
between quality of results, efficiency, and easiness of
implementation.
The parallel programming model of CUDA allows
programmers to partition the main problem in many sub-
problems that can be solved independently in parallel.
Each sub-problem may then be further decomposed into
many modules that can be executed cooperatively in
parallel. In CUDA, each sub-problem becomes a thread
block, which is composed by a certain number of threads
which cooperate to solve the sub-problem in parallel.
The software modules that describe the operation of each
thread are called kernels: when a program running on the
CPU invokes a kernel, a unique set of indices is assigned
to each thread, to denote to which block it belongs and
its position inside it. These indices allow each thread
to ‘personalize’ its access to the data structures and,
in the end, to achieve problem parallelization and
decomposition.
To exploit the impressive computation capabilities of
graphic cards effectively within CUDA and implement a
parallel version of PSO, the best approach is probably to
consider the main phases of the algorithm as separate
tasks, parallelizing each of them separately: this way,
each phase can be implemented by a different kernel and
the whole optimization process can be performed by
iterating the basic kernels needed to perform one gener-
ational update of the swarm. Since the only way CUDA
offers to share data among different kernels is to keep
them in global memory (i.e., the RAM region, featuring
the slowest access time by far, which is shared by the
processes run by the GPU and the ones run by the CPU)
[25], the current status of our PSO must be saved there.
Data organization is therefore the first problem to tackle
to exploit the GPU read/write coalescing capability and
maximize the degree of parallelism of the implementa-
tion. With our data design, it is enough to appropriately
arrange the thread indices to run several swarms at the
same time very efficiently.
In order to have all tasks performed on the GPU, and
avoid, as much as possible, the bottleneck of data exchange
with the CPU using global memory, we generate pseudo-
random numbers running the Mersenne Twister [20]
algorithm directly on the GPU using the kernel available
within the CUDA SDK: this way the CPU load is virtually
zero.
In the following we briefly describe the three kernels
into which our PSO implementation has been subdivided.
Evol. Intel. (2010) 3:155–169 157
123
2.2.1 Position update
A computation grid, divided into a number of blocks of
threads, updates the position of all particles being simu-
lated. Each block updates the data of one particle, while
each thread in a thread block updates one element of the
position and velocity arrays. In the beginning the particle’s
current position, personal best position, velocity and local
best information are loaded, after which the classical PSO
equations are applied.
2.2.2 Fitness evaluation
This kernel is scheduled as a computation grid composed
by one block for each particle being simulated (irrespective
of the swarm to which it belongs). Each block comprises a
number of threads equal to the total number of points that
describe a sign (three sets of 16 points each) so that the
projection of all points on the current image is performed in
parallel. Successively, each thread contributes to building
the histograms described in Sect. 3.3: the thread index
determines to which set/histogram the projected point
under consideration belongs, while the sampled color value
determines which bin of the histogram is to be incre-
mented. Finally the fitness value is computed according to
Eq. 9 where, once again, histogram similarity is assessed in
parallel.
2.2.3 Bests update
For each swarm, a thread block is scheduled with a
number of threads equal to the number of particles in the
swarm. As already mentioned, in our system we have used
a ring topology with radius equal to 1. Firstly, each thread
loads in shared memory both the current and the best
fitness values of its corresponding particle, to update the
personal best, if needed. Successively, the current local
best fitness value is found by computing the best fitness of
each particle’s neighborhood (including the particle and
the neighboring one on both sides of the ring), comparing
it to the best value found so far and updating it, when
necessary.
3 Road sign detection
In this section, we first introduce the basics of projective
geometry which underlie the theory of image acquisition
by means of a camera. Then, we describe the road sign
detection algorithm, based on computer vision and PSO,
focusing, in particular, onto the fitness function, whose
optimization drives the whole detection process.
3.1 The image projection model and camera calibration
The main goal of computer vision is to make a computer
analyze and ‘understand’ the content of an image, i.e., the
projection of a region of the real world which lies within
the field of view of a camera onto the camera’s sensor
plane (the image plane), in order for it to be able to take
some decision based on such an analysis.
The simplest mathematical model which describes the
spatial relationships between the 3D real-world scene and
its projection on the image pixels is a general affine
transform of the following form:
pi ¼ A � Pi ð3Þ
where Pi represents the 3D coordinates of a point in the
world, while pi represents its 2D projection on the image
expressed with homogeneous coordinates. The matrix
A [ M3x3 models a central linear projection and is
usually expressed as
A ¼fx 0 u0
0 fy v0
0 0 1
24
35: ð4Þ
Here, briefly, fx and fy are, respectively, estimates of the
focal length along the x and y direction of the image and
(u0,v0) is an estimate of the principal point (a.k.a. the
‘center of projection’) of the image. The process whose
aim is to determine the above parameters as well as the
position of the camera in the world (not needed in our case)
is usually referred to as camera calibration. The so-called
extrinsic parameters describe the camera position and
orientation while the intrinsic ones are those appearing in
Eq. 4, i.e. focal lengths and center of projection.
In the literature many algorithms for camera calibration
have been proposed for monocular [30, 36, 37] and stereo
systems [17, 39]. Many of them are based on particular
hypotheses that ease the calibration step; usually these
hypotheses are not verified in automotive environments,
such as short-distance perception, still scenarios, or still
camera. In our case, we can consider the last two con-
straints to be satisfied, since we are not taking into account
the correlation between subsequent frames of a video
sequence to infer a model of the car motion and forecast
trajectories, but analyze each frame as an independent
image. We also set the origin of our ‘world’ reference
frame which, in our case, is coincident with the camera
frame, to be a fixed point in the car (see Fig. 1).
Other issues to be tackled in general outdoor and,
especially, in automotive applications are related to spe-
cific conditions of outdoor environments. In fact, temper-
ature and illumination conditions can vary and can be
barely controlled. Regarding illumination, in particular,
extreme situations like direct sunlight or strong reflections
158 Evol. Intel. (2010) 3:155–169
123
must be taken into account. Other light sources, such as car
headlights or reflectors, interfering with the external envi-
ronmental light, might also be present in a typical auto-
motive scene.
3.2 PSO-based road sign detection algorithm
Suppose that an object of known shape and color, having
any possible orientation, may appear within the field of
view of a calibrated camera. In order to detect its presence
and, at the same time, to precisely estimate its position, one
can use the following algorithm (see also [34]):
1. Consider a set of key contour points, of known
coordinates with respect to a reference position, and
representative of the shape and colors of the object.
2. Translate (and rotate) them to a hypothesized position
visible by the camera and project them onto the image.
3. Verify that color histograms of the sets of key points
match those of their projection on the image to assess
the presence of the object being sought.
Road signs are relatively simple objects belonging to
few possible classes characterized by just a few regions of
homogeneous colors. Each sign class can be described by a
model consisting of a few sets of key points which lie just
near the color discontinuities, with points belonging to the
same set being characterized by the same color. Once all
points in a set are projected onto the image plane, one must
verify that the colors of the corresponding pixels in the
image match the ones in the model. A further set of points,
lying just outside the object silhouette, can help verify
whether the object border has been detected: this is, in
general, confirmed when colors of corresponding pixels in
such a region are significantly different from those of the
object.
In Fig. 2 we show three classes of traffic signs (priority,
warning, and prohibitory signs), along with the sets of
points of the model we use to represent them. For each
model, we consider three sets of 16 points: one lies just
outside the external border (therefore, on the image back-
ground), one on the red band just inside the external border,
and one on the central white area, as close to the red border
as possible. Please notice that, for the prohibitory signs, we
use points uniformly distributed along their circular border
while, for the triangular priority and warning signs, points
are more densely distributed in proximity of the corners.
This choice reduces the chance of mismatching circular
signs to triangular ones since, at a similar scale, the corners
of triangular signs lie well outside the borders of the cir-
cular ones.
If a calibrated camera is available on a moving car,
given an estimate of the position and rotation of a road sign
inside the camera’s 3D field of view, the sets of points in
the world reference frame can be roto-translated to this
position and then projected onto the image plane, to verify
the likelihood of the estimate by matching color histo-
grams. All is needed for detection is a method to generate
estimates of signs positions and refine them until the actual
position of a sign is found.
When a pose estimate is available for a sign, all points
belonging to its model can then be projected onto the
image plane using the following equation:
pi ¼ A � ðRe � Pi þ teÞ ð5Þ
where te represents the offset/position of the sign in the x,
y and z directions with respect to the camera mounted on
the car (in our case, to the world reference system, as well).
Re is a 3 9 3 rotation matrix derived from the estimate of
the sign rotation: since a free rotation in the 3D space can
always be expressed with three degrees of freedom, it is
sufficient to estimate three values (e.g., the rotation angles
around the three axes) in order to represent all possible
rotations of a sign.
To this aim we apply PSO, as introduced in Sect. 2. In
our method, each swarm generates location estimates for a
specific class of signs; each particle in the swarm encodes
an estimate of the sign position by four values, which
represent its offsets along the x, y and z axes, as well as its
rotation around the vertical axis (yaw) in the camera ref-
erence frame. Although our system is already structured for
estimating all six degrees of freedom of a pose estimate, we
deliberately chose to ignore the rotation around the camera
optic axis (roll) and the horizontal axis (pitch) after some
preliminary tests. Although it makes sense to have the
system able to estimate every possible rotation of a sign,
we had no experimental evidence about this need, at least
Fig. 1 The projection model used in our system: OwX, OwY, OwZ are
the three axes of the world reference system (which in our case is
coincident with the camera reference system), f is the focal distance,
(u0,v0) is the projection center, and pi : (ui, vi) is the projection of a
generic point Pi : (Xi, Yi, Zi) onto the image plane
Evol. Intel. (2010) 3:155–169 159
123
for the general road configurations we dealt with in our
tests. In fact, introducing all three angles would not affect
the complexity of the fitness function, since the full
transformation is already computed anyway but, of course,
it would significantly increase the size of the PSO search
space.
A particle swarm can then be used to detect the presence
of signs of a specific class within the image, by assigning a
fitness value to each position estimate encoded by its par-
ticles. Such a value is proportional to the similarity
between the projections onto the image plane of the points
belonging to the sign model, obtained according to Eq. 5,
and the corresponding image pixels. If the fitness value in
the point (particle position) under evaluation is above a
given threshold, we consider that point to be the location of
a sign of the class associated to the swarm.
This is the main feature that characterizes this algorithm.
In fact, having an accurate estimation of the position and
orientation of a sign offers the possibility to rectify its
image by means of a simple Inverse Perspective Mapping
(IPM) transform [1], in order to obtain a pre-defined view.
This means it is always possible to obtain a standardized
view (in terms of size and relative orientation) of the
detected sign. This is the optimal input for a classifier
whose purpose is to recognize the content of a detected
sign irrespective of its actual orientation and distance.
Having the signs pre-classified into different classes in the
detection phase is a further significant advantage which
makes recognition easier and more accurate, since a sep-
arate classifier can then be used for each class. At the
moment, we only focus on detection: no classification of
the signs which have been detected is performed, even if
we plan to add a sign recognition module to our system in
the immediate future.
PSO is run at each new frame acquisition for a pre-
defined number of generations. Actually, the algorithm
structure and its GPU implementation permit to schedule
more than one PSO runs per frame. On the one side, this
offers a second opportunity to detect a sign which was
missed in the previous run on that frame. On the other side,
it also allows each swarm to detect more signs belonging to
the same class in the same frame. In fact, when a sign is
detected in the first run, it is ‘hidden’ in order to prevent the
corresponding swarms from focusing on it again during the
subsequent runs.
In the next subsection we describe the fitness function
we use in our PSO based approach in details.
3.3 Fitness function
Let us denote the three sets of points used to describe each
sign class (see, for example, the models in Fig. 2) as
S1 ¼ fs1i g, S2 ¼ fs2
i g and S3 ¼ fs3i g, with sx
i 2 R2 (they all
lie on the xy plane), with i [ [1,16]. Based on the position
encoded by one particle and on the projection matrix
derived from the camera calibration, each set of points is
roto-translated and projected onto the current frame,
obtaining the corresponding three sets of points which lie
on the image plane P1 ¼ fp1i g, P2 ¼ fp2
i g and P3 ¼ fp3i g.
To verify whether the estimated position is actually
correct, three color histograms [33] in the HSV colorspace,
one for each channel, are computed for each set Px with
x [ {1, 2, 3}. Let us denote each of them as Hcx, formally
defined as:
HcxðbÞ ¼
1
n
Xn
i¼1
dðIcðpxi Þ � bÞ ð6Þ
where c [ {H, S, V} specifies the color channel, x [ {1, 2, 3}
identifies the set of points, b [ [1, Nbin], (Nbin being the
number of bins in the histogram), n represents the number
of points in the set (sixteen in our case), the function
d(n) returns 1 when n = 0 and zero otherwise and, finally,
IcðpÞ : R2 ! R maps the intensity of channel c at pixel
-500
-400
-300
-200
-100
0
100
200
300
400
500
-500 -400 -300 -200 -100 0 100 200 300 400 500-500
-400
-300
-200
-100
0
100
200
300
400
500
-500 -400 -300 -200 -100 0 100 200 300 400
set 1set 2set 3
500
-500
-400
-300
-200
-100
0
100
200
300
400
500
-500 -400 -300 -200 -100 0 100 200 300 400 500
set 1set 2set 3
set 1set 2set 3
Fig. 2 The three different sets of points used to represent a priority sign (left), a warning sign (center), and a prohibitory sign (right). The
dimensions of these models conform to the Italian standards (largest versions). All coordinates are expressed in millimeters
160 Evol. Intel. (2010) 3:155–169
123
location p to a certain bin index. The term 1n is used
to normalize the histogram such thatPNbin
b¼1 HcxðbÞ ¼ 1.
Moreover, three additional histograms, denoted as Hcref , are
used as reference histograms for the red band surrounding all
three sign models taken into consideration. The
Bhattacharyya coefficient q [14], which offers an estimate
of the amount of overlap between two statistical samples, is
then used to compare the histograms.
qðH1;H2Þ ¼XNbin
b¼1
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiH1ðbÞH2ðbÞ
p: ð7Þ
The Bhattacharyya coefficient returns a real value
between 0, when there is no overlap at all between the
two histograms, and 1, when the two histograms are
identical. Finally, if we use
Sx;y ¼qðHH
x ;HHy Þ þ qðHS
x ;HSyÞ þ qðHV
x ;HVy Þ
3ð8Þ
to express the similarity of the two triplets of histograms
computed for the sets of points x and y, we can express the
fitness function as
f ¼ k0ð1� S1;2Þ þ k1ð1� S2;3Þ þ k2S1;ref
k0 þ k1 þ k2
ð9Þ
where k0; k1; k2 2 Rþ are used to weigh the contributions
of the three distances appearing in the above equation.
Such a fitness function requires that:
• histograms computed on the first two sets of points be
as different as possible, hypothesizing that, in case the
sign had been detected, the background color nearby
the sign would differ significantly from the red band.
• the histogram of the points in the red band be as
different as possible from the one computed on the
inner area of the sign.
• histograms Hc1 resemble as much as possible the
reference histograms Hcref computed for the red band
surrounding the sign.
Histograms of regions having colors that differ only
slightly from the model, possibly because of noise, produce
high values of S1,ref. The fitness function f will therefore be
close to 1 only when the position of one particle is a good
estimate of the sign pose in the scene captured by the
camera.
Actually, despite this being the most natural and general
way to express this sort of fitness function, we noticed that
system performances improved if we ignored the V (Value,
or Intensity) channel at all. The reason for this stands
probably in the fact that the lighting conditions of the
region surrounding a sign are usually rather uniform, which
makes intensity information useless for the discriminant
properties of the fitness function, even if it affects its value.
At the same time, we verified that in evaluating the refer-
ence (red) color it is preferable to neglect also the S (Sat-
uration) channel. This means that, for the red band, we only
use the H (Hue) channel, which is the only channel which
encodes pure color information.
4 Experimental results
The PSO parameters were set to w = 0.723,
C1 = C2 = 1.193. Three swarms of 64 particles were run
for up to 200 generations per frame to detect regulatory
signs (of circular shape), warning signs (of triangular
shape), and priority signs (of reversed triangular shape),
respectively. The coefficients appearing in Eq. 9 were
empirically set as follows: k0 = 1.4, k1 = 1.0, k2 = 0.8.
The fitness threshold above which we considered a detec-
tion to have occurred was set to 0.9. The search space, in
world coordinates, ranged from -4 m to 6.5 m along the
horizontal direction (x axis), from 6 m to -1.6 m vertically
(y axis), and from 9.5 m to 27 m in the direction of the car
motion and of the camera optic axis (z axis). We finally
allowed a range � p4; p
4
� �for sign rotation with respect to
the vertical axis. All previous settings were set empirically
after some preliminary tests in order for them to represent
general values, independent of a particular sequence of
images, defining a reasonable invariant region for the PSO
to explore.
The overall test of the system was divided in two sep-
arate phases. During the first one, synthetic video sequen-
ces were used to assess the ability of the system to correctly
find and estimate the sign poses. During the second, more
significant phase, real-world images were processed to
assess the system’s detection performances in typical urban
and suburban environments.
4.1 Tests on synthetic images
In the first test phase, we simulated a 3D rural environment
with a road and a pair of traffic signs using the public
domain raytracer POV-Ray1 We relied on the Roadsigns
macros by Chris Bartlett2 to simulate the signs and on some
ready-to use objects by A. Lohmuller and F. A. Lohmuller3
to simulate the road. Bumps and dirtiness were added to the
traffic signs in order to simulate more realistic conditions.
Fig. 3 shows a sample frame from one of the synthetic
sequences. As time passes, the simulated car moves for-
ward zigzagging from left to right. At the same time, as
they get closer to the car, the two signs rotate around their
1 http://www.povray.org.2 http://lib.povray.org/collection/roadsigns/chrisb2.0/roadsigns.html.3 http://f-lohmueller.de/pov_tut/objects/obj_500i.htm.
Evol. Intel. (2010) 3:155–169 161
123
vertical axis. We introduced rotations to test the ability of
our system to estimate the actual roto-translation between
the camera and the sign which is detected. In fact, in our
case, each particle moves in R4 and its position represents
the x, y and z offsets of the sign as well as its rotation with
respect to the vertical axis (yaw).
Figure 4 shows three frames from the very beginning,
from the middle, and from the end of the sequence,
respectively. Image contrast has been reduced in this figure
to better highlight the swarm positions. White points
superimposed to the images represent the best-fitness
estimate of the sign position, while black points depict the
hypotheses represented by all other individuals. In Fig. 4.a
it is possible to see the two swarms during the initial search
phase: in this case both are on the wrong target despite
being already in the proximity of a sign. Figure 4b, c show
how the two swarms correctly converged onto their targets.
For a more detailed performance analysis, Fig. 5 shows
results obtained in estimating the actual position of the
signs throughout the sequence.
Figure 5 (top left) shows the actual x position and the
estimated one (mean and standard deviation over one
hundred runs), versus the frame number, for both the
warning (light line) and the regulatory (dark line) signs. As
can be seen, the horizontal position for the two signs is
correctly detected until the end of the sequence with a
precision of the order of centimeters. The sinusoidal trend
of the two position reflects the zigzagging behaviour of the
car we simulated. The top right part of the figure shows the
results for the y coordinates. This time the actual position is
constant since the simulated car is supposed to have a
constant pitch. Again, the estimated position is correct with
errors of just few centimeters. Similar considerations can
be made for the bottom left graph of Fig. 5, which reports
results of depth (z coordinate) estimation. Even if signs are
rather far from the car (about fifteen meters in the begin-
ning) estimates are very good with errors of less than half a
meter. The error is mostly due to the distance between the
two most external sets of points, which introduces a tol-
erance in the estimation of the actual position of the target
Fig. 3 Sample frame taken from our synthetic video sequence
simulating a country road scenario with two differently shaped
roadsigns
Fig. 4 Output of a run of our road sign detection system at the very
beginning (a), at middle length (b) and near to the end (c) of a
synthetic video sequence
162 Evol. Intel. (2010) 3:155–169
123
border. Tightening this distance could improve the preci-
sion of the results but, at the same time, would make it
more difficult to obtain high fitness values for signs which
are far from the car because, in that situation, the depth
value is large and the projections of these two sets of points
on the image almost overlap, producing color histograms
which are very similar.
Finally, in the bottom right part of the figure we show
results of yaw estimation. In this case results are not as
precise: it is possible to see how the rotation of the
warning sign is estimated rather precisely, even if, in the
first third of the simulation, the standard deviation is very
high, while the rotation of the regulatory sign seems to be
randomly estimated. Since we are dealing with small
angles this must be due, again, to the tolerance in locating
the signs introduced by the distance between the two
external sets of points, which is more likely to affect
fitness when matching circular signs than with triangular
ones.
4.2 Real-world tests
The system was then validated on two real-world image
sequences acquired with cameras placed on-board a car.
We compared results obtained by our system with those
obtained by a road sign recognition system previously
developed by some of the authors [2], affiliated to VisLab4,
the computer vision laboratory of our Department at the
University of Parma. For this reason we will refer to such a
system as the VisLab system from now on.
The benchmark used for this comparison comprised two
of the sequences that were acquired during its develop-
ment. The first sequence, which includes 10000 frames
having a resolution of 750 9 480 pixels, acquired at 7.5 fps
by a PointGrey Firefly color camera, and is therefore about
22 min long, was recorded while driving on the ‘Parma
orbital’ on a sunny day. Since the road circumnavigates the
town and the sequence includes a full tour of the orbital,
the sequence contains images featuring all possible light
orientations. The second sequence, having the same reso-
lution, is little less than 5000 frames long and was acquired
by an AVT Guppy color camera at 7.5 fps on the ‘Turin
orbital’ on a cloudy day. Images in this sequence feature
more constant lighting but lower contrast. A significantly
long segment of this sequence has been acquired while
driving in an urban environment, while most of the first
(a)
-4000
-3000
-2000
-1000
0
1000
2000
3000
0 20 40 60 80 100 120 140 160 180 200
road
sign
s’ x
pos
ition
(m
m)
frame number
Regulatory: estimates posWarning: estimated posRegulatory: actual pos
Warning: actual pos
(b)
600
650
700
750
800
850
900
950
1000
1050
0 20 40 60 80 100 120 140 160 180 200
road
sign
s’ y
pos
ition
(m
m)
frame number
Regulatory: estimated posWarning: estimated posRegulatory: actual pos
Warning: actual pos
(c)
8000
10000
12000
14000
16000
18000
20000
0 20 40 60 80 100 120 140 160 180 200
road
sign
s’ z
pos
ition
(m
m)
frame number
Regulatory: estimated posWarning: estimated posRegulatory: actual pos
Warning: actual pos
(d)
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
0 20 40 60 80 100 120 140 160 180 200
road
sign
s’ Y
aw a
ngle
(ra
d)
frame number
Regulatory: estimates posWarning: estimated posRegulatory: actual pos
Warning: actual pos
Fig. 5 Position estimation errors: a shows the estimation of the horizontal position (x coordinate) for both signs, b shows the vertical position
estimates, and c the depth estimates. The estimations of sign rotations around the vertical axis are shown in d
4 http://www.vislab.it.
Evol. Intel. (2010) 3:155–169 163
123
sequence has been acquired on a separate-lane road. In the
two sequences, the car on which the cameras were mounted
runs at speeds ranging from 0 (a few crossings and
roundabouts are present) to more than 70 km/h.
Using the settings reported above, signs were detected at
an average distance from the car of about 12 m Fig. 6
displays screenshots of the system, taken while running it:
the main window area shows the actual camera view being
currently processed, while the right vertical panel keeps
track of the last road sign detected for each one of the three
categories (regulatory, warning, and priority, see caption).
In Fig. 7 it is possible to see some examples of the signs
which could be detected, as they appear after rectification.
Signs (a)–(c) were detected with normal light conditions.
Signs (d) and (e) had direct sunlight, while sign (f) had a
strong backlight. Images (g) and (h) were the only two false
positives we ever observed during all our tests: image (g)
shows a red car partially occluded by another one, which
creates an inverted triangular shape with a partially red
border, and was detected as a priority sign, while (h), a
detail of a commercial poster showing a red line which
forms a triangle with a red border, was (correctly?)
detected as a warning sign. Sign (i) was partially covered
by a tree. Sign (j) was depicted inside a bigger yellow
panel. The internal region of signs (b) and (i)–(l) is not
white, which highlights the characteristic of the system of
looking for a red band coupled with two color disconti-
nuities on either sides. Finally, signs (m)–(o) were cor-
rectly detected in extremely poor lighting conditions: in the
original frames, which are shown in Fig. 8a, they were
difficult to see even by humans. Looking again at the
rectified sign (a), and comparing it to the original image
showed in Fig. 8b it is possible to notice how the rectifi-
cation process almost reduced to zero the sign rotation with
respect to the vertical axis, causing it to appear as if it had
been observed from a perpendicular frontal viewpoint.
Fig. 6 Sample screen snapshots
taken while processing real-
world images. On the right the
images of the detected signs,
correctly rectified to a standard
frontal view, are shown: the topframe shows prohibitory signs,
the middle frame warning signs,
and the bottom frame priority
signs. Images of the last
detected signs remain visible on
the window until a new sign of
the same category is detected.
The rectangle highlights the
sign which was last detected
164 Evol. Intel. (2010) 3:155–169
123
A validation software based on the unique code assigned
to each sign permitted to further assess the effectiveness of
the system. Based on the annotations made during the
development of the VisLab system, the validation software
produced detection statistics in terms of false positives,
false negatives, and correct detections (true positives).
The above statistics were compared with those obtained,
on the same sequence, by the VisLab system. Such a
Fig. 7 Some of the signs that
were detected, after rectification.
Our system is able to detect
signs in very different lighting
conditions
Evol. Intel. (2010) 3:155–169 165
123
system implements a three-steps process (see Fig. 9): color
segmentation, shape detection, and classification based on
several neural networks. Since illumination conditions
deeply affect the first step, a particular gamma correction
method has been developed to compensate for color drifts
typical of sunset or dawn [2].
Results obtained on the sequence shot in Parma are
compared in Table 1, while data in Table 2 compares
results obtained on the sequence shot in Turin.
Since PSO is a stochastic algorithm, we also assessed
our system’s performance repeatability by executing sev-
eral runs for each sequence. The tables report, for our
system, the best and worst results obtained in the test runs.
In general, the performances of our system were com-
parable, and often better, than those of the VisLab system,
in terms of correct detections. The system was also very
selective and only generated two false positives (reported
in Fig. 7) which, curiously enough, were only noticed while
annotating the results by hand and were not counted as
false positives by the validation software because, by
chance, in the same frames there was actually a sign of
exactly the same kind.
Considering the conditions in which the two test
sequences have been acquired and the results yielded by
the two systems on such sequences, our system seems to be
more robust than VisLab’s with respect to light changes
and critical conditions such as backlight or blinding. In
fact, while results on the sequence shot in Turin are com-
parable, our system outperforms the VisLab’s if the
sequence shot in Parma is taken into consideration. The
only exception, noticeable in both sequences, regards pri-
ority signs. This might seem to contrast with the good
performances obtained on the warning signs, with which
they share shape and reference point sets, differing only for
orientation. However, one should notice that the pole on
which the signs are usually mounted intersects the warning
signs contour in a position (the middle of one side) where
few reference points are present, while it does so close to a
vertex of the priority sign, where reference points for such
a sign are denser. This suggests that the fitness function
may be affected negatively by such an ‘interference’ and
that a different distribution of reference points for such a
sign class may limit this problem. Another possible justi-
fication might be offered by the fact that priority signs are
often mounted above other signs, such as roundabout signs,
which could also affect the fitness function.
4.3 Computation efficiency
The CUDA implementation of the whole system described
above permitted to achieve very good execution times.
Experiments were run on two different machines. The first
one is equipped with a 64-bit Intel(R) Core(TM)2 Duo
Fig. 8 Original frames where signs (m)–(o) (above), and a (below) of
Fig. 7 were detected
Fig. 9 Block diagram of the
VisLab system used as
reference (from [2])
166 Evol. Intel. (2010) 3:155–169
123
processor running at 1.86 GHz with a moderately priced
GeForce 8800GT video card by nVIDIA, equipped with
1 GB of video RAM and fourteen multi-streaming proces-
sors (MPs), which adds up to a total of 112 processing
cores. The second one is powered by a 64-bit Intel(R)
Core(TM) i7 CPU running at 2.67 GHz, combined with a
top-level Quadro FX5800 graphics card, also by nVIDIA,
having 4 Gb of video RAM and thirty MPs or, in other
words, 240 processing cores.
In spite of the great disparity between the two setups, no
major differences were observed between the execution
times. This suggests that our system, in spite of the
impressive amount of operations required, still does not
saturate the computational power of this kind of GPUs.
All our tests were performed off-line. The input video
sequences were encoded as a set of single shots to be
singularly loaded from disk with no streaming. Therefore,
in all timing operations we took care to exclude disk input/
output latency times. In fact, live recordings from a camera
connected to the PC would permit to directly transfer
images data into the computer RAM in almost negligible
time, avoiding slow disk accesses.
With this setup, we obtained good performances with
many different system settings. For example, employing
simultaneously 3 swarms (one per each sign class under
consideration) composed by 64 particles, and activated
twice per frame for 200 generations, yielded a processing
speed of about 20 frames per second (about 50 ms of actual
processing time per frame). The frame rate improved to
about 24 fps if each swarm was run just once per frame.
Using swarms of 32 particles resulted in about 30 fps and
48 fps when two runs or one single run per frame were
executed, respectively. In another set of tests, where only
100 generations per run were simulated, processing speed
further increased to about 50 fps executing two PSO runs
per frame and 65 fps with a single run.
Running each swarm more than once and ‘hiding’, in the
subsequent runs, any sign that has been previously detected
allows a swarm to deal effectively with the presence of
pairs of signs of the same class in the same frame, an event
which occurs rather frequently.
A sequential version of the algorithm would require up
to about half a second for the most demanding (and per-
forming) of the settings reported above. This means that
the same system would hardly process video at 4–5 fps, a
processing speed which is not acceptable for this kind of
application.
Therefore, our system is able to detect many types of
road signs processing images at a speed close to full frame
rate. Considering that many existing systems actually work
at speeds that are significantly lower than full frame rate,
we can also say that some time remains available to per-
form sign classification. From this point of view, being able
to reconstruct a frontal view of given size for each sign
which is detected by our system allows for using rather
simple classifiers in the recognition stage. Therefore, we
expect that embedding such a stage in our system, which is
actually our final goal, will not affect processing speed
significantly.
Even if it is not possible to directly compare the pro-
cessing performance of our system to the ones of the
VisLab system (which is stated to be able to run at about
Table 1 Results for the CUDA-PSO and the VisLab (reference) systems on the ‘Parma orbital’ sequence
System VisLab (reference) CUDA-PSO-RSD
# Category (tot) False Pos False Neg Correct Det False Pos min–max False Neg min–max Correct Det min–max
# Priority (29) 4 9 (31%) 20 (69%) 1–1 9–10 (31–34.5%) 19–20 (65.5–69%)
# Prohibitory (51) 1 33 (64.7%) 18 (35.3%) 0–0 20–26 (39.2–51%) 25–31 (49–60.8%)
# Warning (44) 0 23 (52.3%) 21 (47.7%) 1–1 10–13 (22.7–29.5%) 31–34 (70.5–77.3%)
# Total (124) 5 65 (52.4%) 59 (47.6%) 2–2 39–49 (31.5–39.5%) 75–85 (60.5–68.5%)
Table 2 Results for the CUDA-PSO and the VisLab (reference) systems on the ‘Turin orbital and downtown’ sequence
System VisLab (reference) CUDA-PSO-RSD
# Category (tot) False Pos False Neg Correct Det False Pos False Neg min–max Correct Det min–max
# Priority (14) 0 1 (7.1%) 13 (92.9%) 0–0 4–8 (28.6–57.1%) 6–10 (42.9–71.4%)
# Prohibitory (53) 4 22 (41.5%) 31 (58.5%) 0–0 20–22 (37.7–41.5%) 31–33 (58.5–62.3%)
# Warning (45) 0 10 (22.2%) 35 (77.8%) 0–0 7–8 (15.6–17.8%) 37–38 (82.2–84.4%)
# Total (112) 4 33 (29.5%) 79 (70.5%) 0–0 31–38 (27.7–33.9%) 74–81 (66.1–72.3%)
Evol. Intel. (2010) 3:155–169 167
123
13 fps including sign classification on a dual-processor
Pentium PC with clock frequency of 2 GHz), the perfor-
mances of our system appear to be competitive with our
reference.
5 Conclusions and future directions
We have shown that PSO, provided with a suitable fitness
function, can effectively detect traffic signs in real time.
Experimental results on both synthetic and real video
sequences showed that our system is able to correctly
estimate the position of the signs it detects with a precision
of about ten centimeters in all directions, with depth being
(rather obviously) the less accurate.
We have considered signs of three possible classes,
which, in normal road settings, account for far more than
half of the occurrence of signs. In any case, the three classes
taken into consideration are also those which are more
likely to be confused among one another. In fact, the sign
classes omitted from our analysis have features that are
mostly ‘orthogonal’ to the one that characterize the signs
presently sought and to one another’s. These considerations,
along with the modularity of our system, lead us to expect
that extending our system to those other classes could be
feasible by introducing only small changes in the system,
obtaining comparable results in terms of quality. As con-
cerns processing speed, the considerations about scalability
made in Sect. 4.3 also induce optimistic expectations.
Finally, the way our system detects road signs permits
us to re-project all the images of the detected signs back to
a standard frontal view which represents the optimal input
for a classification step. Because of this, the introduction of
a classification module into the system has the highest
priority in our near-future agenda.
Acknowledgments We would like to express our thanks and
appreciation to Gabriele Novelli, Denis Simonazzi and Marco
Tovagliari for their help in tuning, testing and assessing the perfor-
mances of our system.
References
1. Mallot HA, Bulthoff HH, Little JJ, Bohrer S (1991) Inverse
perspective mapping simplifies optical flow computation and
obstacle. Biol Cybern 64(3):177–185
2. Broggi A, Cerri P, Medici P, Porta PP, Ghisio G (2007) Real time
road signs recognition. In: Proceedings of IEEE intelligent
vehicles symposium 2007, Istanbul, Turkey, pp 981–986
3. Cagnoni S, Mordonini M, Sartori J (2007) Particle swarm opti-
mization for object detection and segmentation. In: Applications
of evolutionary computing. Proceeding of EvoWorkshops 2007,
Springer, pp 241–250
4. Cagnoni S, Mussi L, Daolio F (2009) Empirical assessment of
the effects of update synchronization in Particle Swarm
Optimization. In: Poster and workshop proceedings of the XI
conference of the Italian association for artificial intelligence.
Reggio Emilia, Italy (2009). Electronic version, ISBN 978-88-
903581-1-1
5. Anton Canalis L, Hernandez Tejera M, Sanchez Nielsen E (2006)
Particle swarms as video sequence inhabitants for object tracking
in computer vision. In: Proceedings of IEEE international con-
ference on intelligent systems design and applications (ISDA’06),
pp 604–609
6. Engelbrecht AP (2007) Computational Intelligence: an Intro-
duction, 2nd edn. Wiley, England
7. de la Escalera A, Armignol JM, Mata M (2003) Traffic sign
recognition and analysis for intelligent vehicles. Image Vis
Comput 21(3):247–258
8. de la Escalera A, Moreno LE, Puente EA, Salichs MA (1994)
Neural traffic sign recognition for autonomous vehicles. In:
Proceedings of IEEE 20th international conference on industrial
electronics, control and instrumentation 2:841–846
9. Gao X, Shevtsova N, Hong K, Batty S, Podladchikova L, Golo-
van A, Shaposhnikov D, Gusakova V (2002) Vision models based
identification of traffic signs. In: Proceedings of European con-
ference on color in graphics image and vision. Poitiers, France,
pp 47–51
10. Gavrila D (1999) Traffic sign recognition revisited. In: Muster-
erkennung 1999, 21. DAGM-symposium. Springer, pp 86–93
11. Hoessler H, Wohler C, Lindner F, Kreßel U (2007) Classifier
training based on synthetically generated samples. In: Proceed-
ings of 5th international conference on computer vision systems.
Bielefeld, Germany
12. Ivekovic S, John V, Trucco E (2010) Markerless multi-view
articulated pose estimation using adaptive hierarchical particle
swarm optimisation. In: Di Chio C et al (eds) Applications of
evolutionary computing: proceedings of EvoApplications 2010,
Istanbul, Turkey, Part I, LNCS 6024, Springer, pp 241–250
13. Jiang G-Y, Choi TY (1998) Robust detection of landmarks in
color image based on fuzzy set theory. In: Proceedings of IEEE
4th international conference on signal processing 2:968–971
14. Kailath T (1967) The divergence and Bhattacharyya distance
measures in signal selection. IEEE Trans Commun Technol
15(1):52–60
15. Kennedy J, Eberhart R (1995) Particle swarm optimization. In:
Proceedings IEEE international conference on neural networks,
IV, IEEE, New York, pp 1942–1948
16. Kennedy J, Mendes R (2002) Population structure and particle
swarm performance. In: Proceedings of congress on evolutionary
computation—CEC, IEEE, pp 1671–1676
17. Hyukseong K, Park J, Kak A (2007) A new approach for active
stereo camera calibration. In: Proceedings of IEEE international
conference on robotics and automation, pp 3180–3185
18. Liang J, Qin A, Suganthan P, Baskar S (2006) Comprehensive
learning particle swarm optimizer for global optimization of
multimodal functions. IEEE Trans Evol Comput 10(3):281–295
19. Loy G, Barnes N (2004) Fast shape-based road signs detection for
a driver assistance system. In: Proceedings of IEEE/RSJ inter-
national conference on intelligent robots and systems, Sendai,
Japan, pp 70–75
20. Makoto M, Takuji N (1998) Mersenne Twister: a 623-dimen-
sionally equidistributed uniform pseudo-random number gener-
ator. ACM Trans Model Comput Simul 8(1):3–30
21. Mussi L, Cagnoni S (2008) Artificial creatures for object tracking
and segmentation. In: Applications of evolutionary computing:
proceedings of EvoWorkshops 2008, Springer, pp 255–264
22. Mussi L, Cagnoni S (2009) Particle swarm for pattern matching
in image analysis. In: Serra R et al (eds) Proceedings of WIVACE
2008, Italian Workshop on artificial life and evolutionary com-
puting, World Scientific, pp 89–98
168 Evol. Intel. (2010) 3:155–169
123
23. Mussi L, Daolio F, Cagnoni S (2010) Evaluation of particle
swarm optimization algorithms within the CUDA architecture.
Inf Sci. doi:10.1016/j.ins.2010.08.045
24. Nguwi Y, Kouzani, A (2006) A study on automatic recognition of
road signs. In: Proceedings of IEEE conference on cybernetics
and intelligent systems. Bangkok, Thailand, pp 1–6
25. nVIDIA Corporation (2009) nVIDIA CUDA Programming Guide
v. 2.3. http://www.nvidia.com/object/cuda_develop.html
26. Montes de Oca M, Stutzle T, Birattari M, Dorigo M (2009)
Frankenstein’s PSO: a composite particle swarm optimization
algorithm. IEEE Trans Evol Comput 13(5):1120–1132
27. Owechko Y, Medasani S (2005) A swarm-based volition/atten-
tion framework for object recognition. In: Proceedings of IEEE
conference on computer vision and pattern recognition—work-
shops (CVPR’05). IEEE, pp 91–91
28. Poli R (2008) Analysis of the publications on the applications of
particle swarm optimisation. J Artif Evol Appl 2008(1):1–10
29. Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimi-
zation: an overview. Swarm Intel 1(1):33–57
30. Tsai RY (1987) A versatile camera calibration technique for
high-accuracy 3D machine vision metrology using off-the-
shelf TV cameras and lenses. IEEE J Robot Autom 3:323–
344
31. Shadeed WG, Abu-Al Nadi DI, Mismar MJ (2003) Road traffic
sign detection in color images. In: Proceedings of IEEE 10th
international conference on electronics, circuits and systems
2:890–893
32. Soetedjo A, Yamada K (2005) Fast and robust traffic sign
detection. In: Proceedings of IEEE international conference on
systems, man and cybernetics 2:1341–1346
33. Sonka M, Hlavac V, Boyle R (2007) Image processing, analysis,
and machine vision, 3rd edn. CL-Engineering
34. Taiana M, Nascimento J, Gaspar J, Bernardino A (2008) Sample-
based 3D tracking of colored objects: a flexible architecture. In:
Proceedings of British machine vision conference (BMVC’08).
BMVA, pp 1–10
35. Vitabile S, Pollaccia G, Pilato G (2001) Road signs recognition
using a dynamic pixel aggregation technique in the HSV color
space. In: Proceedings of international conference on image
analysis and processing, Palermo, Italy, pp 572–577
36. Wei GQ, Ma SD (1994) Implicit and explicit camera calibration:
theory and experiments. IEEE Trans Pattern Anal Machine Intell
16(5):469–480
37. Zhang Z (2000) A flexible new technique for camera calibration.
IEEE Trans Pattern Anal Machine Intell 22(11):1330–1334.
http://research.microsoft.com/*zhang/Calib/
38. Zhang X, Hu W, Maybank S, Li X, Zhu M (2008) Sequential
particle swarm optimization for visual tracking. In: Proceedings
of IEEE conference on computer vision and pattern recognition
(CVPR’08). IEEE, pp 1–8
39. Ziraknejad N, Tafazoli S, Lawrence P (2007) Autonomous stereo
camera parameter estimation for outdoor visual servoing. In:
Proceedings of IEEE Workshop on machine learning for signal
processing, pp 157–162
Evol. Intel. (2010) 3:155–169 169
123