gpu implementation of a road sign detector ... - unipr.it

RESEARCH PAPER

GPU implementation of a road sign detector based on particleswarm optimization

Luca Mussi • Stefano Cagnoni • Elena Cardarelli •

Fabio Daolio • Paolo Medici • Pier Paolo Porta

Received: 8 February 2010 / Revised: 9 July 2010 / Accepted: 23 September 2010 / Published online: 15 October 2010

� Springer-Verlag 2010

Abstract Road Sign Detection is a major goal of the

Advanced Driving Assistance Systems. Most published

work on this problem share the same approach by which

signs are first detected and then classified in video sequen-

ces, even if different techniques are used. While detection is

usually performed using classical computer vision tech-

niques based on color and/or shape matching, most often

classification is performed by neural networks. In this work

we present a novel modular and scalable approach to road

sign detection based on Particle Swarm Optimization, which

takes into account both shape and color to detect signs.

In our approach, in particular, the optimization of a single

fitness function allows both to detect a sign belonging to a

certain category and, at the same time, to estimate its posi-

tion with respect to the camera reference frame. To speed up

processing, the algorithm implementation exploits the par-

allel computing capabilities offered by modern graphics

cards and, in particular, by the Compute Unified Device

Architecture by nVIDIA. The effectiveness of the approach

has been assessed on both synthetic and real video sequen-

ces, which have been successfully processed at, or close to,

full frame rate.

Keywords Particle swarm optimization � Road sign

detection � GPU computing � Parallel computing

1 Introduction

Automatic traffic sign detection and classification is a very

important issue for the Advanced Driver Assistance Sys-

tems (ADAS) and road safety. It can both improve safety

and help navigation, by providing critical information the

driver could otherwise miss, limiting or compensating for

drivers’ distractions. Because of this, several road sign

detectors have been developed in the last 10 years [24].

In most industrial systems only speed limit signs are

detected, since these are considered to be the most relevant

for safety. Nevertheless, information provided by warning

signs, mandatory signs and all the remaining prohibitory

signs can also be extremely significant: ignoring the pres-

ence of such signs can lead to dangerous situations or even

accidents. Automatic road sign detection systems can be

used to both warn drivers in these situations and supply

additional environmental information to other on-board

systems such as the Automatic Cruise Control (ACC), the

Lane Departure Warning (LDW), etc.

Both gray-scale and color cameras can be used to this

purpose: in the first case, search is mainly based on shape

and can be quite demanding in terms of computation time

[10, 19]. Using a color camera, the search can be based

mainly on chromatic information: color segmentation is, in

L. Mussi � S. Cagnoni (&) � E. Cardarelli � P. Medici �P. P. Porta

Dipartimento di Ingegneria dell’Informazione, University

of Parma, Viale G. Usberti 181a, 43124 Parma, Italy

e-mail: [email protected]

L. Mussi


E. Cardarelli


P. Medici


P. P. Porta


F. Daolio

Information Systems Institute (ISI) - HEC, University

of Lausanne, Internef 135, CH-1015 Lausanne, Switzerland


123

Evol. Intel. (2010) 3:155–169

DOI 10.1007/s12065-010-0043-y

general, faster than shape detection, even if it requires

additional filtering; however, images acquired by inex-

pensive color cameras can suffer from artifacts deriving

from Bayer conversion or from other problems related, for

instance, to color balance [2].

The approaches to traffic sign detection which rely on

color images are usually based on color bases different

from RGB; the HSV/HSI color space is the most frequently

used [7, 35] but other color spaces, such as CIECAM97 [9],

can be used as well. On the one hand, these spaces separate

chromatic information from lighting information, making

detection of a specified color mostly independent of light

conditions. On the other hand, the RGB [32] and YUV [31]

color spaces require no transformations, or just very simple

ones; however, they require more sophisticated segmenta-

tion algorithms, since the boundary between colors is

fuzzier. In order to make detection more robust, both color

segmentation and shape recognition can be used in coop-

eration [9].

As regards sign recognition, most methods are based on

computational intelligence techniques [6], the most fre-

quent being neural networks [8, 11] and fuzzy logic [13].

In this paper we present a novel approach to road sign

detection based on geometric transformations of basic sign

templates. Our approach projects sets of three-dimensional

points, which sample significant regions of the road sign

template to be detected and describe its shape, onto the

image plane according to a transformation which maps 3D

points in the camera reference frame onto the image; the

transformed set of points is then matched to the corre-

sponding image pixels. The likelihood of detection is

estimated using a similarity measure between color histo-

grams [34]. This procedure can actually estimate the pose

of any object based on a 3D model within any projection

system and with any general object model. One of the

advantages over other model-based approaches is that this

approach does not need any preliminary pre-processing of

the image (like, for example, color segmentation) or any

reprojection of the full three-dimensional model [34].

Another peculiar feature with respect to similar work [7],

besides the aforementioned similarity measure, is that our

method relies upon Particle Swarm Optimization (PSO) to

estimate, by a single transformation, the pose of the sign in

the 3D space at the same time as the position of the sign in

the image.

Despite being more efficient than many other meta-

heuristics, PSO is still rather demanding in terms of

computational resources; therefore, a sequential imple-

mentation of the algorithm would be too slow for real-time

applications. As for all other metaheuristics, this is espe-

cially true when the function to be optimized is itself

computationally complex. The PSO algorithm we have

used in this work has been implemented within the nVIDIA

CUDA environment [23, 25], to take advantage of the

computing power offered by the massively parallel archi-

tectures available nowadays even on cheap consumer video

cards. As will be shown, thanks also to the parallel nature

of PSO, this choice allowed the final system to manage

several swarms at the same time, each specialized in

detecting a specific class of road signs.

This paper is organized as follows: Sect. 2 briefly

introduces PSO and its parallel implementation within

CUDA; Sect. 3 addresses the problem of road sign detec-

tion, motivating our approach and offering further details

on how shape and color information is processed to com-

pute fitness. Finally, in Sect. 4, we report results obtained

on both a synthetic video sequence containing two signs

and on two real video sequences, acquired on-board a car,

for a total running time of about 30 min.

2 GPU implementation of particle swarm optimization

Particle Swarm Optimization is a simple but powerful

optimization algorithm, introduced by Kennedy and Eb-

erhart [15]. In the last decade many variants of the basic

PSO algorithm have been developed [18, 26, 29] and

successfully applied to many problems in several fields

[28], image analysis being one of the most frequent

ones. In fact, image analysis tasks can be often refor-

mulated as the optimization of an objective function,

directly derived from the physical features of the prob-

lem being solved. Beyond this, PSO can often be more

than a way to ‘tune’ the parameters of another algorithm,

but can be directly the main building block for an ori-

ginal solution. For example, [3, 5, 21, 27, 38] use PSO

to directly infer the position of an object that is sought

in the image.

2.1 PSO basics

PSO searches for the optimum of a fitness function, fol-

lowing rules inspired by the behavior of flocks of birds in

search of food. A population of particles move within the

fitness function domain (usually termed their search

space), sampling the function in the points corresponding

to their position. This means that, after each particle’s

move, the fitness computed at its new position is evaluated.

In their motion, particles preserve part of their velocity

(inertia), while undergoing two attraction forces: the first

one, called cognitive attraction, attracts a particle towards

the best position it visited so far, while the second one,

called social attraction, pulls the particle towards the best

position ever found by the whole swarm. Based on this

model, in basic PSO, the following velocity and position

update equations are computed for each particle:

156 Evol. Intel. (2010) 3:155–169

123

ViðtÞ ¼ w � Viðt � 1Þþ C1 � R1 � ½Xibðt � 1Þ � Xiðt � 1Þ�þ C2 � R2 � ½Xigb

ðt � 1Þ � Xiðt � 1Þ� ð1Þ

XiðtÞ ¼ Xiðt � 1Þ þ ViðtÞ ð2Þ

where the subscript i refers to the i-th dimension of the

search space, V is the velocity of the particle, C1, C2 are

two positive constants, w is the inertia weight, X(t) is the

particle position at time t, Xb(t - 1) is the best fitness

position visited by the particle up to time t - 1, Xgb(t - 1)

is the best fitness point ever visited by the whole swarm; R1

and R2 are two random numbers from a uniform distribu-

tion in [0, 1].

Many variants of the basic algorithm have been devel-

oped [29], some of which have focused on the algorithm

behavior when different topologies are defined for the

neighborhoods of the particles [16]. A usual variant of PSO

substitutes Xgbðt � 1Þ with Xlbðt � 1Þ, which represents the

‘local’ best position ever found by all particles within a

pre-set neighborhood of the particle under consideration.

This formulation admits, in turn, several variants,

depending on the topology of such neighborhoods. Among

others, Kennedy and coworkers evaluated different kinds

of topologies, finding that good performance could be

achieved using random and Von Neumann neighborhoods

[16]. Nevertheless, the authors also indicated that selecting

the most efficient neighborhood structure is, in general, a

problem dependent task. Since the random topology is

usually designed such that each particle communicates

with two random neighbors, most often using a simple ring

topology is adequate for the problem at hand, while

allowing for an easier implementation.

Whatever the choices of the algorithm structure, para-

meters, etc., and despite good convergence properties, PSO

is still an iterative process which, depending on problem

difficulty, may require several thousands (when not mil-

lions) of particle updates and fitness evaluations. Therefore,

designing efficient PSO implementations is a problem of

great practical relevance. This becomes even more critical,

if one considers real-time applications to dynamic envi-

ronments in which, for example, the fast convergence

properties of PSO may be used to track moving points of

interest (maxima or minima of a specific dynamically-

changing fitness function) in real time. This is the case, for

example, of computer vision applications in which PSO has

been used to track moving objects [22] or to determine

location and orientation of objects or people [12, 21].

2.2 Implementing PSO within CUDA

We implemented a standard PSO with particles organized

with the classical ring topology [23]. The rationale behind

this choice is, on the one hand, the inadequacy of PSO

with synchronous best update and global-best topology,

which would have been the most natural and easiest

parallel implementation, for optimizing multi-modal

problems [4]. On the other hand, as reported above, PSO

with ring topology provides a very good compromise

between quality of results, efficiency, and easiness of

implementation.

The parallel programming model of CUDA allows

programmers to partition the main problem in many sub-

problems that can be solved independently in parallel.

Each sub-problem may then be further decomposed into

many modules that can be executed cooperatively in

parallel. In CUDA, each sub-problem becomes a thread

block, which is composed by a certain number of threads

which cooperate to solve the sub-problem in parallel.

The software modules that describe the operation of each

thread are called kernels: when a program running on the

CPU invokes a kernel, a unique set of indices is assigned

to each thread, to denote to which block it belongs and

its position inside it. These indices allow each thread

to ‘personalize’ its access to the data structures and,

in the end, to achieve problem parallelization and

decomposition.

To exploit the impressive computation capabilities of

graphic cards effectively within CUDA and implement a

parallel version of PSO, the best approach is probably to

consider the main phases of the algorithm as separate

tasks, parallelizing each of them separately: this way,

each phase can be implemented by a different kernel and

the whole optimization process can be performed by

iterating the basic kernels needed to perform one gener-

ational update of the swarm. Since the only way CUDA

offers to share data among different kernels is to keep

them in global memory (i.e., the RAM region, featuring

the slowest access time by far, which is shared by the

processes run by the GPU and the ones run by the CPU)

[25], the current status of our PSO must be saved there.

Data organization is therefore the first problem to tackle

to exploit the GPU read/write coalescing capability and

maximize the degree of parallelism of the implementa-

tion. With our data design, it is enough to appropriately

arrange the thread indices to run several swarms at the

same time very efficiently.

In order to have all tasks performed on the GPU, and

avoid, as much as possible, the bottleneck of data exchange

with the CPU using global memory, we generate pseudo-

random numbers running the Mersenne Twister [20]

algorithm directly on the GPU using the kernel available

within the CUDA SDK: this way the CPU load is virtually

zero.

In the following we briefly describe the three kernels

into which our PSO implementation has been subdivided.

Evol. Intel. (2010) 3:155–169 157

123

2.2.1 Position update

A computation grid, divided into a number of blocks of

threads, updates the position of all particles being simu-

lated. Each block updates the data of one particle, while

each thread in a thread block updates one element of the

position and velocity arrays. In the beginning the particle’s

current position, personal best position, velocity and local

best information are loaded, after which the classical PSO

equations are applied.

2.2.2 Fitness evaluation

This kernel is scheduled as a computation grid composed

by one block for each particle being simulated (irrespective

of the swarm to which it belongs). Each block comprises a

number of threads equal to the total number of points that

describe a sign (three sets of 16 points each) so that the

projection of all points on the current image is performed in

parallel. Successively, each thread contributes to building

the histograms described in Sect. 3.3: the thread index

determines to which set/histogram the projected point

under consideration belongs, while the sampled color value

determines which bin of the histogram is to be incre-

mented. Finally the fitness value is computed according to

Eq. 9 where, once again, histogram similarity is assessed in

parallel.

2.2.3 Bests update

For each swarm, a thread block is scheduled with a

number of threads equal to the number of particles in the

swarm. As already mentioned, in our system we have used

a ring topology with radius equal to 1. Firstly, each thread

loads in shared memory both the current and the best

fitness values of its corresponding particle, to update the

personal best, if needed. Successively, the current local

best fitness value is found by computing the best fitness of

each particle’s neighborhood (including the particle and

the neighboring one on both sides of the ring), comparing

it to the best value found so far and updating it, when

necessary.

3 Road sign detection

In this section, we first introduce the basics of projective

geometry which underlie the theory of image acquisition

by means of a camera. Then, we describe the road sign

detection algorithm, based on computer vision and PSO,

focusing, in particular, onto the fitness function, whose

optimization drives the whole detection process.

3.1 The image projection model and camera calibration

The main goal of computer vision is to make a computer

analyze and ‘understand’ the content of an image, i.e., the

projection of a region of the real world which lies within

the field of view of a camera onto the camera’s sensor

plane (the image plane), in order for it to be able to take

some decision based on such an analysis.

The simplest mathematical model which describes the

spatial relationships between the 3D real-world scene and

its projection on the image pixels is a general affine

transform of the following form:

pi ¼ A � Pi ð3Þ

where Pi represents the 3D coordinates of a point in the

world, while pi represents its 2D projection on the image

expressed with homogeneous coordinates. The matrix

A [ M3x3 models a central linear projection and is

usually expressed as

A ¼fx 0 u0

0 fy v0

0 0 1

24

35: ð4Þ

Here, briefly, fx and fy are, respectively, estimates of the

focal length along the x and y direction of the image and

(u0,v0) is an estimate of the principal point (a.k.a. the

‘center of projection’) of the image. The process whose

aim is to determine the above parameters as well as the

position of the camera in the world (not needed in our case)

is usually referred to as camera calibration. The so-called

extrinsic parameters describe the camera position and

orientation while the intrinsic ones are those appearing in

Eq. 4, i.e. focal lengths and center of projection.

In the literature many algorithms for camera calibration

have been proposed for monocular [30, 36, 37] and stereo

systems [17, 39]. Many of them are based on particular

hypotheses that ease the calibration step; usually these

hypotheses are not verified in automotive environments,

such as short-distance perception, still scenarios, or still

camera. In our case, we can consider the last two con-

straints to be satisfied, since we are not taking into account

the correlation between subsequent frames of a video

sequence to infer a model of the car motion and forecast

trajectories, but analyze each frame as an independent

image. We also set the origin of our ‘world’ reference

frame which, in our case, is coincident with the camera

frame, to be a fixed point in the car (see Fig. 1).

Other issues to be tackled in general outdoor and,

especially, in automotive applications are related to spe-

cific conditions of outdoor environments. In fact, temper-

ature and illumination conditions can vary and can be

barely controlled. Regarding illumination, in particular,

extreme situations like direct sunlight or strong reflections

158 Evol. Intel. (2010) 3:155–169

123

must be taken into account. Other light sources, such as car

headlights or reflectors, interfering with the external envi-

ronmental light, might also be present in a typical auto-

motive scene.

3.2 PSO-based road sign detection algorithm

Suppose that an object of known shape and color, having

any possible orientation, may appear within the field of

view of a calibrated camera. In order to detect its presence

and, at the same time, to precisely estimate its position, one

can use the following algorithm (see also [34]):

1. Consider a set of key contour points, of known

coordinates with respect to a reference position, and

representative of the shape and colors of the object.

2. Translate (and rotate) them to a hypothesized position

visible by the camera and project them onto the image.

3. Verify that color histograms of the sets of key points

match those of their projection on the image to assess

the presence of the object being sought.

Road signs are relatively simple objects belonging to

few possible classes characterized by just a few regions of

homogeneous colors. Each sign class can be described by a

model consisting of a few sets of key points which lie just

near the color discontinuities, with points belonging to the

same set being characterized by the same color. Once all

points in a set are projected onto the image plane, one must

verify that the colors of the corresponding pixels in the

image match the ones in the model. A further set of points,

lying just outside the object silhouette, can help verify

whether the object border has been detected: this is, in

general, confirmed when colors of corresponding pixels in

such a region are significantly different from those of the

object.

In Fig. 2 we show three classes of traffic signs (priority,

warning, and prohibitory signs), along with the sets of

points of the model we use to represent them. For each

model, we consider three sets of 16 points: one lies just

outside the external border (therefore, on the image back-

ground), one on the red band just inside the external border,

and one on the central white area, as close to the red border

as possible. Please notice that, for the prohibitory signs, we

use points uniformly distributed along their circular border

while, for the triangular priority and warning signs, points

are more densely distributed in proximity of the corners.

This choice reduces the chance of mismatching circular

signs to triangular ones since, at a similar scale, the corners

of triangular signs lie well outside the borders of the cir-

cular ones.

If a calibrated camera is available on a moving car,

given an estimate of the position and rotation of a road sign

inside the camera’s 3D field of view, the sets of points in

the world reference frame can be roto-translated to this

position and then projected onto the image plane, to verify

the likelihood of the estimate by matching color histo-

grams. All is needed for detection is a method to generate

estimates of signs positions and refine them until the actual

position of a sign is found.

When a pose estimate is available for a sign, all points

belonging to its model can then be projected onto the

image plane using the following equation:

pi ¼ A � ðRe � Pi þ teÞ ð5Þ

where te represents the offset/position of the sign in the x,

y and z directions with respect to the camera mounted on

the car (in our case, to the world reference system, as well).

Re is a 3 9 3 rotation matrix derived from the estimate of

the sign rotation: since a free rotation in the 3D space can

always be expressed with three degrees of freedom, it is

sufficient to estimate three values (e.g., the rotation angles

around the three axes) in order to represent all possible

rotations of a sign.

To this aim we apply PSO, as introduced in Sect. 2. In

our method, each swarm generates location estimates for a

specific class of signs; each particle in the swarm encodes

an estimate of the sign position by four values, which

represent its offsets along the x, y and z axes, as well as its

rotation around the vertical axis (yaw) in the camera ref-

erence frame. Although our system is already structured for

estimating all six degrees of freedom of a pose estimate, we

deliberately chose to ignore the rotation around the camera

optic axis (roll) and the horizontal axis (pitch) after some

preliminary tests. Although it makes sense to have the

system able to estimate every possible rotation of a sign,

we had no experimental evidence about this need, at least

Fig. 1 The projection model used in our system: OwX, OwY, OwZ are

the three axes of the world reference system (which in our case is

coincident with the camera reference system), f is the focal distance,

(u0,v0) is the projection center, and pi : (ui, vi) is the projection of a

generic point Pi : (Xi, Yi, Zi) onto the image plane

Evol. Intel. (2010) 3:155–169 159

123

for the general road configurations we dealt with in our

tests. In fact, introducing all three angles would not affect

the complexity of the fitness function, since the full

transformation is already computed anyway but, of course,

it would significantly increase the size of the PSO search

space.

A particle swarm can then be used to detect the presence

of signs of a specific class within the image, by assigning a

fitness value to each position estimate encoded by its par-

ticles. Such a value is proportional to the similarity

between the projections onto the image plane of the points

belonging to the sign model, obtained according to Eq. 5,

and the corresponding image pixels. If the fitness value in

the point (particle position) under evaluation is above a

given threshold, we consider that point to be the location of

a sign of the class associated to the swarm.

This is the main feature that characterizes this algorithm.

In fact, having an accurate estimation of the position and

orientation of a sign offers the possibility to rectify its

image by means of a simple Inverse Perspective Mapping

(IPM) transform [1], in order to obtain a pre-defined view.

This means it is always possible to obtain a standardized

view (in terms of size and relative orientation) of the

detected sign. This is the optimal input for a classifier

whose purpose is to recognize the content of a detected

sign irrespective of its actual orientation and distance.

Having the signs pre-classified into different classes in the

detection phase is a further significant advantage which

makes recognition easier and more accurate, since a sep-

arate classifier can then be used for each class. At the

moment, we only focus on detection: no classification of

the signs which have been detected is performed, even if

we plan to add a sign recognition module to our system in

the immediate future.

PSO is run at each new frame acquisition for a pre-

defined number of generations. Actually, the algorithm

structure and its GPU implementation permit to schedule

more than one PSO runs per frame. On the one side, this

offers a second opportunity to detect a sign which was

missed in the previous run on that frame. On the other side,

it also allows each swarm to detect more signs belonging to

the same class in the same frame. In fact, when a sign is

detected in the first run, it is ‘hidden’ in order to prevent the

corresponding swarms from focusing on it again during the

subsequent runs.

In the next subsection we describe the fitness function

we use in our PSO based approach in details.

3.3 Fitness function

Let us denote the three sets of points used to describe each

sign class (see, for example, the models in Fig. 2) as

S1 ¼ fs1i g, S2 ¼ fs2

i g and S3 ¼ fs3i g, with sx

i 2 R2 (they all

lie on the xy plane), with i [ [1,16]. Based on the position

encoded by one particle and on the projection matrix

derived from the camera calibration, each set of points is

roto-translated and projected onto the current frame,

obtaining the corresponding three sets of points which lie

on the image plane P1 ¼ fp1i g, P2 ¼ fp2

i g and P3 ¼ fp3i g.

To verify whether the estimated position is actually

correct, three color histograms [33] in the HSV colorspace,

one for each channel, are computed for each set Px with

x [ {1, 2, 3}. Let us denote each of them as Hcx, formally

defined as:

HcxðbÞ ¼

1

n

Xn

i¼1

dðIcðpxi Þ � bÞ ð6Þ

where c [ {H, S, V} specifies the color channel, x [ {1, 2, 3}

identifies the set of points, b [ [1, Nbin], (Nbin being the

number of bins in the histogram), n represents the number

of points in the set (sixteen in our case), the function

d(n) returns 1 when n = 0 and zero otherwise and, finally,

IcðpÞ : R2 ! R maps the intensity of channel c at pixel

-500

-400

-300

-200

-100

0

100

200

300

400

500

-500 -400 -300 -200 -100 0 100 200 300 400 500-500

-400

-300

-200

-100

0

100

200

300

400

500

-500 -400 -300 -200 -100 0 100 200 300 400

set 1set 2set 3

500

-500

-400

-300

-200

-100

0

100

200

300

400

500

-500 -400 -300 -200 -100 0 100 200 300 400 500

set 1set 2set 3

set 1set 2set 3

Fig. 2 The three different sets of points used to represent a priority sign (left), a warning sign (center), and a prohibitory sign (right). The

dimensions of these models conform to the Italian standards (largest versions). All coordinates are expressed in millimeters

160 Evol. Intel. (2010) 3:155–169

123

location p to a certain bin index. The term 1n is used

to normalize the histogram such thatPNbin

b¼1 HcxðbÞ ¼ 1.

Moreover, three additional histograms, denoted as Hcref , are

used as reference histograms for the red band surrounding all

three sign models taken into consideration. The

Bhattacharyya coefficient q [14], which offers an estimate

of the amount of overlap between two statistical samples, is

then used to compare the histograms.

qðH1;H2Þ ¼XNbin

b¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiH1ðbÞH2ðbÞ

p: ð7Þ

The Bhattacharyya coefficient returns a real value

between 0, when there is no overlap at all between the

two histograms, and 1, when the two histograms are

identical. Finally, if we use

Sx;y ¼qðHH

x ;HHy Þ þ qðHS

x ;HSyÞ þ qðHV

x ;HVy Þ

3ð8Þ

to express the similarity of the two triplets of histograms

computed for the sets of points x and y, we can express the

fitness function as

f ¼ k0ð1� S1;2Þ þ k1ð1� S2;3Þ þ k2S1;ref

k0 þ k1 þ k2

ð9Þ

where k0; k1; k2 2 Rþ are used to weigh the contributions

of the three distances appearing in the above equation.

Such a fitness function requires that:

• histograms computed on the first two sets of points be

as different as possible, hypothesizing that, in case the

sign had been detected, the background color nearby

the sign would differ significantly from the red band.

• the histogram of the points in the red band be as

different as possible from the one computed on the

inner area of the sign.

• histograms Hc1 resemble as much as possible the

reference histograms Hcref computed for the red band

surrounding the sign.

Histograms of regions having colors that differ only

slightly from the model, possibly because of noise, produce

high values of S1,ref. The fitness function f will therefore be

close to 1 only when the position of one particle is a good

estimate of the sign pose in the scene captured by the

camera.

Actually, despite this being the most natural and general

way to express this sort of fitness function, we noticed that

system performances improved if we ignored the V (Value,

or Intensity) channel at all. The reason for this stands

probably in the fact that the lighting conditions of the

region surrounding a sign are usually rather uniform, which

makes intensity information useless for the discriminant

properties of the fitness function, even if it affects its value.

At the same time, we verified that in evaluating the refer-

ence (red) color it is preferable to neglect also the S (Sat-

uration) channel. This means that, for the red band, we only

use the H (Hue) channel, which is the only channel which

encodes pure color information.

4 Experimental results

The PSO parameters were set to w = 0.723,

C1 = C2 = 1.193. Three swarms of 64 particles were run

for up to 200 generations per frame to detect regulatory

signs (of circular shape), warning signs (of triangular

shape), and priority signs (of reversed triangular shape),

respectively. The coefficients appearing in Eq. 9 were

empirically set as follows: k0 = 1.4, k1 = 1.0, k2 = 0.8.

The fitness threshold above which we considered a detec-

tion to have occurred was set to 0.9. The search space, in

world coordinates, ranged from -4 m to 6.5 m along the

horizontal direction (x axis), from 6 m to -1.6 m vertically

(y axis), and from 9.5 m to 27 m in the direction of the car

motion and of the camera optic axis (z axis). We finally

allowed a range � p4; p

4

� �for sign rotation with respect to

the vertical axis. All previous settings were set empirically

after some preliminary tests in order for them to represent

general values, independent of a particular sequence of

images, defining a reasonable invariant region for the PSO

to explore.

The overall test of the system was divided in two sep-

arate phases. During the first one, synthetic video sequen-

ces were used to assess the ability of the system to correctly

find and estimate the sign poses. During the second, more

significant phase, real-world images were processed to

assess the system’s detection performances in typical urban

and suburban environments.

4.1 Tests on synthetic images

In the first test phase, we simulated a 3D rural environment

with a road and a pair of traffic signs using the public

domain raytracer POV-Ray1 We relied on the Roadsigns

macros by Chris Bartlett2 to simulate the signs and on some

ready-to use objects by A. Lohmuller and F. A. Lohmuller3

to simulate the road. Bumps and dirtiness were added to the

traffic signs in order to simulate more realistic conditions.

Fig. 3 shows a sample frame from one of the synthetic

sequences. As time passes, the simulated car moves for-

ward zigzagging from left to right. At the same time, as

they get closer to the car, the two signs rotate around their

1 http://www.povray.org.2 http://lib.povray.org/collection/roadsigns/chrisb2.0/roadsigns.html.3 http://f-lohmueller.de/pov_tut/objects/obj_500i.htm.

Evol. Intel. (2010) 3:155–169 161

123

http://www.povray.org

http://lib.povray.org/collection/roadsigns/chrisb2.0/roadsigns.html

http://f-lohmueller.de/pov_tut/objects/obj_500i.htm

vertical axis. We introduced rotations to test the ability of

our system to estimate the actual roto-translation between

the camera and the sign which is detected. In fact, in our

case, each particle moves in R4 and its position represents

the x, y and z offsets of the sign as well as its rotation with

respect to the vertical axis (yaw).

Figure 4 shows three frames from the very beginning,

from the middle, and from the end of the sequence,

respectively. Image contrast has been reduced in this figure

to better highlight the swarm positions. White points

superimposed to the images represent the best-fitness

estimate of the sign position, while black points depict the

hypotheses represented by all other individuals. In Fig. 4.a

it is possible to see the two swarms during the initial search

phase: in this case both are on the wrong target despite

being already in the proximity of a sign. Figure 4b, c show

how the two swarms correctly converged onto their targets.

For a more detailed performance analysis, Fig. 5 shows

results obtained in estimating the actual position of the

signs throughout the sequence.

Figure 5 (top left) shows the actual x position and the

estimated one (mean and standard deviation over one

hundred runs), versus the frame number, for both the

warning (light line) and the regulatory (dark line) signs. As

can be seen, the horizontal position for the two signs is

correctly detected until the end of the sequence with a

precision of the order of centimeters. The sinusoidal trend

of the two position reflects the zigzagging behaviour of the

car we simulated. The top right part of the figure shows the

results for the y coordinates. This time the actual position is

constant since the simulated car is supposed to have a

constant pitch. Again, the estimated position is correct with

errors of just few centimeters. Similar considerations can

be made for the bottom left graph of Fig. 5, which reports

results of depth (z coordinate) estimation. Even if signs are

rather far from the car (about fifteen meters in the begin-

ning) estimates are very good with errors of less than half a

meter. The error is mostly due to the distance between the

two most external sets of points, which introduces a tol-

erance in the estimation of the actual position of the target

Fig. 3 Sample frame taken from our synthetic video sequence

simulating a country road scenario with two differently shaped

roadsigns

Fig. 4 Output of a run of our road sign detection system at the very

beginning (a), at middle length (b) and near to the end (c) of a

synthetic video sequence

162 Evol. Intel. (2010) 3:155–169

123

border. Tightening this distance could improve the preci-

sion of the results but, at the same time, would make it

more difficult to obtain high fitness values for signs which

are far from the car because, in that situation, the depth

value is large and the projections of these two sets of points

on the image almost overlap, producing color histograms

which are very similar.

Finally, in the bottom right part of the figure we show

results of yaw estimation. In this case results are not as

precise: it is possible to see how the rotation of the

warning sign is estimated rather precisely, even if, in the

first third of the simulation, the standard deviation is very

high, while the rotation of the regulatory sign seems to be

randomly estimated. Since we are dealing with small

angles this must be due, again, to the tolerance in locating

the signs introduced by the distance between the two

external sets of points, which is more likely to affect

fitness when matching circular signs than with triangular

ones.

4.2 Real-world tests

The system was then validated on two real-world image

sequences acquired with cameras placed on-board a car.

We compared results obtained by our system with those

obtained by a road sign recognition system previously

developed by some of the authors [2], affiliated to VisLab4,

the computer vision laboratory of our Department at the

University of Parma. For this reason we will refer to such a

system as the VisLab system from now on.

The benchmark used for this comparison comprised two

of the sequences that were acquired during its develop-

ment. The first sequence, which includes 10000 frames

having a resolution of 750 9 480 pixels, acquired at 7.5 fps

by a PointGrey Firefly color camera, and is therefore about

22 min long, was recorded while driving on the ‘Parma

orbital’ on a sunny day. Since the road circumnavigates the

town and the sequence includes a full tour of the orbital,

the sequence contains images featuring all possible light

orientations. The second sequence, having the same reso-

lution, is little less than 5000 frames long and was acquired

by an AVT Guppy color camera at 7.5 fps on the ‘Turin

orbital’ on a cloudy day. Images in this sequence feature

more constant lighting but lower contrast. A significantly

long segment of this sequence has been acquired while

driving in an urban environment, while most of the first

(a)

-4000

-3000

-2000

-1000

0

1000

2000

3000

0 20 40 60 80 100 120 140 160 180 200

road

sign

s’ x

pos

ition

(m

m)

frame number

Regulatory: estimates posWarning: estimated posRegulatory: actual pos

Warning: actual pos

(b)

600

650

700

750

800

850

900

950

1000

1050

0 20 40 60 80 100 120 140 160 180 200

road

sign

s’ y

pos

ition

(m

m)

frame number

Regulatory: estimated posWarning: estimated posRegulatory: actual pos

Warning: actual pos

(c)

8000

10000

12000

14000

16000

18000

20000

0 20 40 60 80 100 120 140 160 180 200

road

sign

s’ z

pos

ition

(m

m)

frame number

Regulatory: estimated posWarning: estimated posRegulatory: actual pos

Warning: actual pos

(d)

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

0 20 40 60 80 100 120 140 160 180 200

road

sign

s’ Y

aw a

ngle

(ra

d)

frame number

Regulatory: estimates posWarning: estimated posRegulatory: actual pos

Warning: actual pos

Fig. 5 Position estimation errors: a shows the estimation of the horizontal position (x coordinate) for both signs, b shows the vertical position

estimates, and c the depth estimates. The estimations of sign rotations around the vertical axis are shown in d

4 http://www.vislab.it.

Evol. Intel. (2010) 3:155–169 163

123

http://www.vislab.it

sequence has been acquired on a separate-lane road. In the

two sequences, the car on which the cameras were mounted

runs at speeds ranging from 0 (a few crossings and

roundabouts are present) to more than 70 km/h.

Using the settings reported above, signs were detected at

an average distance from the car of about 12 m Fig. 6

displays screenshots of the system, taken while running it:

the main window area shows the actual camera view being

currently processed, while the right vertical panel keeps

track of the last road sign detected for each one of the three

categories (regulatory, warning, and priority, see caption).

In Fig. 7 it is possible to see some examples of the signs

which could be detected, as they appear after rectification.

Signs (a)–(c) were detected with normal light conditions.

Signs (d) and (e) had direct sunlight, while sign (f) had a

strong backlight. Images (g) and (h) were the only two false

positives we ever observed during all our tests: image (g)

shows a red car partially occluded by another one, which

creates an inverted triangular shape with a partially red

border, and was detected as a priority sign, while (h), a

detail of a commercial poster showing a red line which

forms a triangle with a red border, was (correctly?)

detected as a warning sign. Sign (i) was partially covered

by a tree. Sign (j) was depicted inside a bigger yellow

panel. The internal region of signs (b) and (i)–(l) is not

white, which highlights the characteristic of the system of

looking for a red band coupled with two color disconti-

nuities on either sides. Finally, signs (m)–(o) were cor-

rectly detected in extremely poor lighting conditions: in the

original frames, which are shown in Fig. 8a, they were

difficult to see even by humans. Looking again at the

rectified sign (a), and comparing it to the original image

showed in Fig. 8b it is possible to notice how the rectifi-

cation process almost reduced to zero the sign rotation with

respect to the vertical axis, causing it to appear as if it had

been observed from a perpendicular frontal viewpoint.

Fig. 6 Sample screen snapshots

taken while processing real-

world images. On the right the

images of the detected signs,

correctly rectified to a standard

frontal view, are shown: the topframe shows prohibitory signs,

the middle frame warning signs,

and the bottom frame priority

signs. Images of the last

detected signs remain visible on

the window until a new sign of

the same category is detected.

The rectangle highlights the

sign which was last detected

164 Evol. Intel. (2010) 3:155–169

123

A validation software based on the unique code assigned

to each sign permitted to further assess the effectiveness of

the system. Based on the annotations made during the

development of the VisLab system, the validation software

produced detection statistics in terms of false positives,

false negatives, and correct detections (true positives).

The above statistics were compared with those obtained,

on the same sequence, by the VisLab system. Such a

Fig. 7 Some of the signs that

were detected, after rectification.

Our system is able to detect

signs in very different lighting

conditions

Evol. Intel. (2010) 3:155–169 165

123

system implements a three-steps process (see Fig. 9): color

segmentation, shape detection, and classification based on

several neural networks. Since illumination conditions

deeply affect the first step, a particular gamma correction

method has been developed to compensate for color drifts

typical of sunset or dawn [2].

Results obtained on the sequence shot in Parma are

compared in Table 1, while data in Table 2 compares

results obtained on the sequence shot in Turin.

Since PSO is a stochastic algorithm, we also assessed

our system’s performance repeatability by executing sev-

eral runs for each sequence. The tables report, for our

system, the best and worst results obtained in the test runs.

In general, the performances of our system were com-

parable, and often better, than those of the VisLab system,

in terms of correct detections. The system was also very

selective and only generated two false positives (reported

in Fig. 7) which, curiously enough, were only noticed while

annotating the results by hand and were not counted as

false positives by the validation software because, by

chance, in the same frames there was actually a sign of

exactly the same kind.

Considering the conditions in which the two test

sequences have been acquired and the results yielded by

the two systems on such sequences, our system seems to be

more robust than VisLab’s with respect to light changes

and critical conditions such as backlight or blinding. In

fact, while results on the sequence shot in Turin are com-

parable, our system outperforms the VisLab’s if the

sequence shot in Parma is taken into consideration. The

only exception, noticeable in both sequences, regards pri-

ority signs. This might seem to contrast with the good

performances obtained on the warning signs, with which

they share shape and reference point sets, differing only for

orientation. However, one should notice that the pole on

which the signs are usually mounted intersects the warning

signs contour in a position (the middle of one side) where

few reference points are present, while it does so close to a

vertex of the priority sign, where reference points for such

a sign are denser. This suggests that the fitness function

may be affected negatively by such an ‘interference’ and

that a different distribution of reference points for such a

sign class may limit this problem. Another possible justi-

fication might be offered by the fact that priority signs are

often mounted above other signs, such as roundabout signs,

which could also affect the fitness function.

4.3 Computation efficiency

The CUDA implementation of the whole system described

above permitted to achieve very good execution times.

Experiments were run on two different machines. The first

one is equipped with a 64-bit Intel(R) Core(TM)2 Duo

Fig. 8 Original frames where signs (m)–(o) (above), and a (below) of

Fig. 7 were detected

Fig. 9 Block diagram of the

VisLab system used as

reference (from [2])

166 Evol. Intel. (2010) 3:155–169

123

processor running at 1.86 GHz with a moderately priced

GeForce 8800GT video card by nVIDIA, equipped with

1 GB of video RAM and fourteen multi-streaming proces-

sors (MPs), which adds up to a total of 112 processing

cores. The second one is powered by a 64-bit Intel(R)

Core(TM) i7 CPU running at 2.67 GHz, combined with a

top-level Quadro FX5800 graphics card, also by nVIDIA,

having 4 Gb of video RAM and thirty MPs or, in other

words, 240 processing cores.

In spite of the great disparity between the two setups, no

major differences were observed between the execution

times. This suggests that our system, in spite of the

impressive amount of operations required, still does not

saturate the computational power of this kind of GPUs.

All our tests were performed off-line. The input video

sequences were encoded as a set of single shots to be

singularly loaded from disk with no streaming. Therefore,

in all timing operations we took care to exclude disk input/

output latency times. In fact, live recordings from a camera

connected to the PC would permit to directly transfer

images data into the computer RAM in almost negligible

time, avoiding slow disk accesses.

With this setup, we obtained good performances with

many different system settings. For example, employing

simultaneously 3 swarms (one per each sign class under

consideration) composed by 64 particles, and activated

twice per frame for 200 generations, yielded a processing

speed of about 20 frames per second (about 50 ms of actual

processing time per frame). The frame rate improved to

about 24 fps if each swarm was run just once per frame.

Using swarms of 32 particles resulted in about 30 fps and

48 fps when two runs or one single run per frame were

executed, respectively. In another set of tests, where only

100 generations per run were simulated, processing speed

further increased to about 50 fps executing two PSO runs

per frame and 65 fps with a single run.

Running each swarm more than once and ‘hiding’, in the

subsequent runs, any sign that has been previously detected

allows a swarm to deal effectively with the presence of

pairs of signs of the same class in the same frame, an event

which occurs rather frequently.

A sequential version of the algorithm would require up

to about half a second for the most demanding (and per-

forming) of the settings reported above. This means that

the same system would hardly process video at 4–5 fps, a

processing speed which is not acceptable for this kind of

application.

Therefore, our system is able to detect many types of

road signs processing images at a speed close to full frame

rate. Considering that many existing systems actually work

at speeds that are significantly lower than full frame rate,

we can also say that some time remains available to per-

form sign classification. From this point of view, being able

to reconstruct a frontal view of given size for each sign

which is detected by our system allows for using rather

simple classifiers in the recognition stage. Therefore, we

expect that embedding such a stage in our system, which is

actually our final goal, will not affect processing speed

significantly.

Even if it is not possible to directly compare the pro-

cessing performance of our system to the ones of the

VisLab system (which is stated to be able to run at about

Table 1 Results for the CUDA-PSO and the VisLab (reference) systems on the ‘Parma orbital’ sequence

System VisLab (reference) CUDA-PSO-RSD

# Category (tot) False Pos False Neg Correct Det False Pos min–max False Neg min–max Correct Det min–max

# Priority (29) 4 9 (31%) 20 (69%) 1–1 9–10 (31–34.5%) 19–20 (65.5–69%)

# Prohibitory (51) 1 33 (64.7%) 18 (35.3%) 0–0 20–26 (39.2–51%) 25–31 (49–60.8%)

# Warning (44) 0 23 (52.3%) 21 (47.7%) 1–1 10–13 (22.7–29.5%) 31–34 (70.5–77.3%)

# Total (124) 5 65 (52.4%) 59 (47.6%) 2–2 39–49 (31.5–39.5%) 75–85 (60.5–68.5%)

Table 2 Results for the CUDA-PSO and the VisLab (reference) systems on the ‘Turin orbital and downtown’ sequence

System VisLab (reference) CUDA-PSO-RSD

# Category (tot) False Pos False Neg Correct Det False Pos False Neg min–max Correct Det min–max

# Priority (14) 0 1 (7.1%) 13 (92.9%) 0–0 4–8 (28.6–57.1%) 6–10 (42.9–71.4%)

# Prohibitory (53) 4 22 (41.5%) 31 (58.5%) 0–0 20–22 (37.7–41.5%) 31–33 (58.5–62.3%)

# Warning (45) 0 10 (22.2%) 35 (77.8%) 0–0 7–8 (15.6–17.8%) 37–38 (82.2–84.4%)

# Total (112) 4 33 (29.5%) 79 (70.5%) 0–0 31–38 (27.7–33.9%) 74–81 (66.1–72.3%)

Evol. Intel. (2010) 3:155–169 167

123

13 fps including sign classification on a dual-processor

Pentium PC with clock frequency of 2 GHz), the perfor-

mances of our system appear to be competitive with our

reference.

5 Conclusions and future directions

We have shown that PSO, provided with a suitable fitness

function, can effectively detect traffic signs in real time.

Experimental results on both synthetic and real video

sequences showed that our system is able to correctly

estimate the position of the signs it detects with a precision

of about ten centimeters in all directions, with depth being

(rather obviously) the less accurate.

We have considered signs of three possible classes,

which, in normal road settings, account for far more than

half of the occurrence of signs. In any case, the three classes

taken into consideration are also those which are more

likely to be confused among one another. In fact, the sign

classes omitted from our analysis have features that are

mostly ‘orthogonal’ to the one that characterize the signs

presently sought and to one another’s. These considerations,

along with the modularity of our system, lead us to expect

that extending our system to those other classes could be

feasible by introducing only small changes in the system,

obtaining comparable results in terms of quality. As con-

cerns processing speed, the considerations about scalability

made in Sect. 4.3 also induce optimistic expectations.

Finally, the way our system detects road signs permits

us to re-project all the images of the detected signs back to

a standard frontal view which represents the optimal input

for a classification step. Because of this, the introduction of

a classification module into the system has the highest

priority in our near-future agenda.

Acknowledgments We would like to express our thanks and

appreciation to Gabriele Novelli, Denis Simonazzi and Marco

Tovagliari for their help in tuning, testing and assessing the perfor-

mances of our system.

References

1. Mallot HA, Bulthoff HH, Little JJ, Bohrer S (1991) Inverse

perspective mapping simplifies optical flow computation and

obstacle. Biol Cybern 64(3):177–185

2. Broggi A, Cerri P, Medici P, Porta PP, Ghisio G (2007) Real time

road signs recognition. In: Proceedings of IEEE intelligent

vehicles symposium 2007, Istanbul, Turkey, pp 981–986

3. Cagnoni S, Mordonini M, Sartori J (2007) Particle swarm opti-

mization for object detection and segmentation. In: Applications

of evolutionary computing. Proceeding of EvoWorkshops 2007,

Springer, pp 241–250

4. Cagnoni S, Mussi L, Daolio F (2009) Empirical assessment of

the effects of update synchronization in Particle Swarm

Optimization. In: Poster and workshop proceedings of the XI

conference of the Italian association for artificial intelligence.

Reggio Emilia, Italy (2009). Electronic version, ISBN 978-88-

903581-1-1

5. Anton Canalis L, Hernandez Tejera M, Sanchez Nielsen E (2006)

Particle swarms as video sequence inhabitants for object tracking

in computer vision. In: Proceedings of IEEE international con-

ference on intelligent systems design and applications (ISDA’06),

pp 604–609

6. Engelbrecht AP (2007) Computational Intelligence: an Intro-

duction, 2nd edn. Wiley, England

7. de la Escalera A, Armignol JM, Mata M (2003) Traffic sign

recognition and analysis for intelligent vehicles. Image Vis

Comput 21(3):247–258

8. de la Escalera A, Moreno LE, Puente EA, Salichs MA (1994)

Neural traffic sign recognition for autonomous vehicles. In:

Proceedings of IEEE 20th international conference on industrial

electronics, control and instrumentation 2:841–846

9. Gao X, Shevtsova N, Hong K, Batty S, Podladchikova L, Golo-

van A, Shaposhnikov D, Gusakova V (2002) Vision models based

identification of traffic signs. In: Proceedings of European con-

ference on color in graphics image and vision. Poitiers, France,

pp 47–51

10. Gavrila D (1999) Traffic sign recognition revisited. In: Muster-

erkennung 1999, 21. DAGM-symposium. Springer, pp 86–93

11. Hoessler H, Wohler C, Lindner F, Kreßel U (2007) Classifier

training based on synthetically generated samples. In: Proceed-

ings of 5th international conference on computer vision systems.

Bielefeld, Germany

12. Ivekovic S, John V, Trucco E (2010) Markerless multi-view

articulated pose estimation using adaptive hierarchical particle

swarm optimisation. In: Di Chio C et al (eds) Applications of

evolutionary computing: proceedings of EvoApplications 2010,

Istanbul, Turkey, Part I, LNCS 6024, Springer, pp 241–250

13. Jiang G-Y, Choi TY (1998) Robust detection of landmarks in

color image based on fuzzy set theory. In: Proceedings of IEEE

4th international conference on signal processing 2:968–971

14. Kailath T (1967) The divergence and Bhattacharyya distance

measures in signal selection. IEEE Trans Commun Technol

15(1):52–60

15. Kennedy J, Eberhart R (1995) Particle swarm optimization. In:

Proceedings IEEE international conference on neural networks,

IV, IEEE, New York, pp 1942–1948

16. Kennedy J, Mendes R (2002) Population structure and particle

swarm performance. In: Proceedings of congress on evolutionary

computation—CEC, IEEE, pp 1671–1676

17. Hyukseong K, Park J, Kak A (2007) A new approach for active

stereo camera calibration. In: Proceedings of IEEE international

conference on robotics and automation, pp 3180–3185

18. Liang J, Qin A, Suganthan P, Baskar S (2006) Comprehensive

learning particle swarm optimizer for global optimization of

multimodal functions. IEEE Trans Evol Comput 10(3):281–295

19. Loy G, Barnes N (2004) Fast shape-based road signs detection for

a driver assistance system. In: Proceedings of IEEE/RSJ inter-

national conference on intelligent robots and systems, Sendai,

Japan, pp 70–75

20. Makoto M, Takuji N (1998) Mersenne Twister: a 623-dimen-

sionally equidistributed uniform pseudo-random number gener-

ator. ACM Trans Model Comput Simul 8(1):3–30

21. Mussi L, Cagnoni S (2008) Artificial creatures for object tracking

and segmentation. In: Applications of evolutionary computing:

proceedings of EvoWorkshops 2008, Springer, pp 255–264

22. Mussi L, Cagnoni S (2009) Particle swarm for pattern matching

in image analysis. In: Serra R et al (eds) Proceedings of WIVACE

2008, Italian Workshop on artificial life and evolutionary com-

puting, World Scientific, pp 89–98

168 Evol. Intel. (2010) 3:155–169

123

23. Mussi L, Daolio F, Cagnoni S (2010) Evaluation of particle

swarm optimization algorithms within the CUDA architecture.

Inf Sci. doi:10.1016/j.ins.2010.08.045

24. Nguwi Y, Kouzani, A (2006) A study on automatic recognition of

road signs. In: Proceedings of IEEE conference on cybernetics

and intelligent systems. Bangkok, Thailand, pp 1–6

25. nVIDIA Corporation (2009) nVIDIA CUDA Programming Guide

v. 2.3. http://www.nvidia.com/object/cuda_develop.html

26. Montes de Oca M, Stutzle T, Birattari M, Dorigo M (2009)

Frankenstein’s PSO: a composite particle swarm optimization

algorithm. IEEE Trans Evol Comput 13(5):1120–1132

27. Owechko Y, Medasani S (2005) A swarm-based volition/atten-

tion framework for object recognition. In: Proceedings of IEEE

conference on computer vision and pattern recognition—work-

shops (CVPR’05). IEEE, pp 91–91

28. Poli R (2008) Analysis of the publications on the applications of

particle swarm optimisation. J Artif Evol Appl 2008(1):1–10

29. Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimi-

zation: an overview. Swarm Intel 1(1):33–57

30. Tsai RY (1987) A versatile camera calibration technique for

high-accuracy 3D machine vision metrology using off-the-

shelf TV cameras and lenses. IEEE J Robot Autom 3:323–

344

31. Shadeed WG, Abu-Al Nadi DI, Mismar MJ (2003) Road traffic

sign detection in color images. In: Proceedings of IEEE 10th

international conference on electronics, circuits and systems

2:890–893

32. Soetedjo A, Yamada K (2005) Fast and robust traffic sign

detection. In: Proceedings of IEEE international conference on

systems, man and cybernetics 2:1341–1346

33. Sonka M, Hlavac V, Boyle R (2007) Image processing, analysis,

and machine vision, 3rd edn. CL-Engineering

34. Taiana M, Nascimento J, Gaspar J, Bernardino A (2008) Sample-

based 3D tracking of colored objects: a flexible architecture. In:

Proceedings of British machine vision conference (BMVC’08).

BMVA, pp 1–10

35. Vitabile S, Pollaccia G, Pilato G (2001) Road signs recognition

using a dynamic pixel aggregation technique in the HSV color

space. In: Proceedings of international conference on image

analysis and processing, Palermo, Italy, pp 572–577

36. Wei GQ, Ma SD (1994) Implicit and explicit camera calibration:

theory and experiments. IEEE Trans Pattern Anal Machine Intell

16(5):469–480

37. Zhang Z (2000) A flexible new technique for camera calibration.

IEEE Trans Pattern Anal Machine Intell 22(11):1330–1334.

http://research.microsoft.com/*zhang/Calib/

38. Zhang X, Hu W, Maybank S, Li X, Zhu M (2008) Sequential

particle swarm optimization for visual tracking. In: Proceedings

of IEEE conference on computer vision and pattern recognition

(CVPR’08). IEEE, pp 1–8

39. Ziraknejad N, Tafazoli S, Lawrence P (2007) Autonomous stereo

camera parameter estimation for outdoor visual servoing. In:

Proceedings of IEEE Workshop on machine learning for signal

processing, pp 157–162

Evol. Intel. (2010) 3:155–169 169

123

http://dx.doi.org/10.1016/j.ins.2010.08.045

http://www.nvidia.com/object/cuda_develop.html

http://research.microsoft.com/~zhang/Calib/

gpu implementation of a road sign detector ... - unipr.it

Documents