analyzing traffic density in images with low temporal and ...kyungim/papers/baek_ivcnz14.pdf ·...
TRANSCRIPT
Analyzing Traffic Density in Images with Low Temporal and Spatial Resolution
Michael Claveria Dept. of Information and Computer Sciences
University of Hawai`I at Mānoa 1680 East-West Road, Honolulu, HI 96822, USA
Kyungim Baek* Dept. of Information and Computer Sciences
University of Hawai`I at Mānoa 1680 East-West Road, Honolulu, HI 96822, USA
ABSTRACT
The increasing proliferation of traffic monitoring technology has
brought about sophisticated techniques for traffic monitoring such
as motion tracking using active or optical sensors. Image
processing techniques to identify vehicles and track velocity are
possible using real time video feedback from traffic cameras
along major roads and highways. However, many cities have
limitations on camera and equipment quality which obstruct
traffic monitoring processes. In Honolulu, the traffic images
posted on the traffic monitoring website have a 3 minutes delay
between frames. This makes it impossible to perform vehicle
tracking based on those images. Variations in camera angles and
low spatial resolution also make the task of monitoring traffic
more difficult. In this paper two simple traffic density estimators
with two different background models are implemented and
compared to each other. The estimator first separates traffic
foreground from road background using moving average or
codebook methods. A modified Hough transformation identifies
potential road area and then the traffic density is quantified as
percentage of traffic contained within the road area of an image.
These techniques deal with the limitations of traffic images with
low spatial resolution and low frame rate.
Categories and Subject Descriptors
I.4.8 [Image Processing and Computer Vision]: Scene Analysis
General Terms
Algorithms, Performance, Experimentation
Keywords
Traffic density analysis, Background, Foreground, Moving
average, Codebook, Classification
1. INTRODUCTION Traffic monitoring devices have become more commonplace in
the lives of everyday motorists as techniques for detecting traffic
have become more sophisticated over time. According to a recent
urban mobility report, traffic congestion has been increasing over
the years and costs the nation $121 billion a year in wasted travel
time and fuel [1]. A real time traffic analysis can provide users
with information for real time travel options to avoid congestions
based on current road transit and parking conditions. It can also
aid in the future development of transportation systems by
providing data on traffic patterns.
One common approach to traffic monitoring includes using active
sensors which are typically radar, laser, or acoustic based (e.g. [2,
3, 4]). These sensors are considered active because they detect the
objects by measuring the travel time of a signal emitted by the
sensors and reflected by the objects [5]. The main advantage of
active sensors is their ability to measuring quantities like distance
without requiring powerful computing resources and sophisticated
processes.
Optical sensors (cameras), on the other hand, are referred to as
passive sensors because they obtain data in a nonintrusive way.
The advantage of passive sensors is the lower cost for
implementation and maintenance. Passive sensors can also
provide visual information that, when processed, can be used for
tasks such as object identification (pedestrians and other objects)
that active sensors cannot. They can adjust the angle of viewing
much easier than radar sensors can adjust to area of effect. In the
past 30 years there have been tremendous strides in using optical
sensors to quantify traffic patterns on major roads and highways.
The vision based traffic monitoring process has improved with the
growth of the computer vision field, the proliferation of feasible
technology, and the exponential increase in processor speeds.
In one of the classic papers classifying traffic using a computer
vision technique, Riddler et al. [6] modeled each pixel in a frame
using a Kalman Filter to predict the traffic density. Koller et al.
[7] then used this model to create an automatic traffic monitoring
application. The model was robust to lighting changes in the
scene, however, it recovered slowly and did not handle bimodal
backgrounds well. Wren et al. [8] used a multi-class statistical
model, Pfinder, for tracking objects and a single Gaussian model
for each pixel in the background. This produced good results,
however, its application was limited to indoor scenes and not
tested on outdoor scenes. Friedman and Russell [9] used a pixel-
wise EM framework for detection of vehicles. They classified
pixel values into three separate distributions corresponding to road
color, shadow color and vehicle color. For this study it is not clear
how this works for pixels that present multiple background colors
such as those resulting from repetitive motions or reflectance.
Stauffer and Grimson [10] used a mixture of Gaussians for each
pixel to deal with lighting changes and repetitive motions of
objects. It successfully tracked people and cars in an outdoor
environment and adjusted to changes in the background over time,
but worked with video processing with high temporal resolution.
Li et al. [11] proposed a real-time virtual loop detector
(mimicking the idea of a physical loop detector) by using a
boosted support vector machine classifier to probabilistically
determine the traffic density state. They achieved an average
accuracy at around 95% under different daytime illumination and
weather conditions. However, they also have a high enough
frame rate to perform tracking sequences in their project.
* Author to whom correspondence should be addressed.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. IVCNZ’14, November 19 – 21 2014, Hamilton, New Zealand Copyright 2014 ACM 978-1-4503-3184-5/14/11…$15.00 http://dx.doi.org/10.1145/2683405.2683412
Official Hawaii state traffic cameras located on over 90
intersections in the island of Oahu report images of the traffic
scene to a Honolulu Country traffic website every 3 minutes [12].
Such extremely low refresh rates provide traffic snapshots of
individual locations, but make it difficult to track traffic over time.
It is also difficult for an individual to process multiple traffic
images around the entire state all at once. A solution is to
automate the process to let a computer analyze traffic images and
report the traffic density conditions.
Two popular methods of measuring traffic density include
counting vehicles by means of motion sensor devices along the
road and using GPS systems to track cell phone location of a
motorist on the road. Automated computer vision processes,
especially on the island of Oahu, provide another such opportunity
for vehicle analysis due to the proliferation of cameras along the
non-freeway roads of the island. Image processing is also much
more cost effective over sensors and motion detectors since it
does not require additional hardware setup and maintenance
beyond the original cameras.
While many of the previously proposed vision based traffic
analysis methods were successful in tracking and identifying
objects within images, these methods are not fully applicable to
analyzing traffic density in images taken from traffic cameras on
Oahu. The publically available traffic images from the Honolulu
traffic cameras have a low spatial resolution and large interval
times between successive images from one location. With a 3
minutes interval, there is little chance of a car in one images being
present in a future image thus methods that rely on motion
analysis across images are not applicable. Some methods that
were designed to work indoors do not capture the changes in
lighting and shadows that occur with outdoor traffic conditions.
Others that used algorithms to identified individual objects could
only work for images with high enough spatial resolution. Low
spatial resolution makes identifying individual cars difficult since
a group of cars looks like one large clump in a highly pixilated
image. Finally, a few methods use complex algorithms that might
identify traffic down to the single pixel, but otherwise could not
be used practically due to long processing times. Since traffic is
classified as qualitatively as heavy, light, etc. a reasonable traffic
estimation is sufficient for classification.
The model presented in this paper is part of a process to create
and maintain a traffic monitoring system that accurately and
efficiently displays qualitative traffic density conditions in real
time for locations with an installed traffic camera. An application
for this work is to extend automated traffic monitoring coverage
to the arterial roads under the surveillance of traffic cameras.
When used together with other popular applications that already
measure traffic congestion along freeways and major roads, it
would help to complete traffic monitoring around the entire
island. This information could be used in several applications in
the future such as developing a real-time automated best route
finder or monitoring traffic conditions over a long period of time.
Due to real time gathering process of the images, this preliminary
model is a streamlined approach with computing times less than a
second. For simplicity, this work focused on images from four
cameras placed on the intersections of: University Ave. and Dole
St., King St. and Bishop St., King St. and Punchbowl St., and
Beretania St. and Punchbowl St. (Figure 1). A sequence of 30
training images from each site are gathered and used to build
background models that are subsequently used to estimate the
traffic density of a set of 20 test images from those sites.
Figure 1. Sample traffic images of the four different
intersections (University avenue and Dole street, King and
Bishop streets, King and Punchbowl streets, and Beretania
and Punchbowl streets).
2. TRAFFIC DENSITY ESTIMATION
2.1 Background Modeling Detection of foreground traffic requires creating a background
model using training images. The two different approaches used
to model the background are Moving Average and Codebook
methods.
Moving Average
Moving average models the background by creating an ideal “no
traffic” image to use as a base image for traffic comparisons.
Images are averaged over time with higher weight given to more
recent images. With newest data point of weight n, and each
successive data point decreasing in weight by 1, we calculate the
weighted moving average for n considered points by
12)1(
2)1( )1()2(1
nn
pppnnpMA
ntnttt
t
where weight n is given to the most recent image pt. When incorporating a new image after the n images, replace the
oldest image following the equations:
12)1(
1
1
11
)1(11
nn
NumeratorMA
TotalnpNumeratorNumerator
ppTotalTotal
t
t
tttt
ntttt
where 𝑇𝑜𝑡𝑎𝑙𝑡 is 𝑝𝑡 + ⋯ + 𝑝𝑡−(𝑛−1).
Due to the frequency of cars passing, it is difficult to find an
image without vehicles in it that could serve as an ideal
background image. One means of getting around this is by
constructing an ideal background image by averaging out vehicles
over time. Calculating a weighted moving average of a set of
images effectively creates a “no traffic” image (Figure 2). This
method finds the mean pixel intensity over a series of training
images. The idea is that most of the images will have the ideal
background pixel while a few others will have a car in the
background for a given pixel. Creating the average image will
theoretically resemble the background more than the vehicles.
This method gives weighted priority to more recent images to
better reflect current conditions. After each image is evaluated the
image is added to the vector of test images to enhance the current
sample of images. Thus the system will learn and theoretically be
more accurate as it gathers more images over time. It can also
adapt to changes in the background over time.
Figure 2. The moving average images for each of the four
intersections generated from 30 training images from each site
shown in Figure 1.
Codebook
A second method of constructing a background model uses a
structure called codebook to differentiate between foreground and
background [13, 14]. Codebook looks at frequencies of each pixel
value over a sequence of images. If there are enough values that
extend beyond a single range of values, codebook accepts discrete
ranges of values (Figure 3).
Figure 3. A simple illustration of codebook formation
reproduced based on the figure in [13]. The original pixel
values grow the box bounds over time and larger values that
occur later in the graph create two discrete box ranges. If this
is the finalized codebook background model, any values that
fit into the range of those two boxes at the end will be
considered part of the background.
In this work, we model codebook using simple boxes that cover
common values as seen over time. Each box is defined by min
and max thresholds for each of the three color axes. The box
grows (i.e. the bounding thresholds expand) if the newer
background samples fall within a learning threshold above the
max or below the min threshold. Any background samples that
fall out of the box will start a new box. Codebook method can
handle pixels whose values might change dramatically but still
keep to a discrete range. It is more robust in handling changes in
the background than moving average and can adapt to shifting
shadows and lighting with enough training images, but it is more
computationally intensive than the moving average method.
2.2 Foreground Extraction Process Once the background model is constructed from the training
images by the moving average or the codebook method, it is
converted to grayscale background image and the foreground
traffic in a test image is found by subtracting the background
image from a grayscale version of the test image. Then, dilation
and erosion operations are applied to the resulting image using a
33 block structuring element to remove noise and to fill the holes
of larger regions [15]. Thresholding is applied to the image after
the morphological operators to produce a black and white image.
The white pixels in the binary image should represent the area
where vehicles occupy the scene. Finally, a connected component
analysis is performed to remove small regions and to fill in gaps
between regions that should be connected [15].
(a) (b) (c) (d)
Figure 4. (a) Original image. (b) Moving average background.
(c) Result of subtracting background (b) from (a) and
applying threshold. (d) Result of the connected components
performed on (c) and superimposed over the image (a).
Figure 4 shows the results of the foreground extraction process.
Image (d) shows that the morphological operators followed by
connected component analysis cleans the foreground image (c)
formed by background subtraction. Note how some of the
extraneous features, such as trees, as well as some smaller cars in
the background are removed in image (d). By adjusting
parameters one can vary the regions identified as foreground.
2.3 Extracting Region of Interest (Road Area) The traffic images have background area that cannot contain
traffic but can nevertheless throw off results due to excess noise.
Such areas include the sky, buildings in the background, trees and
sidewalk area. For more accurate results it is important to identify
the region of potential traffic, i.e. the road area.
Road area can be manually found by cropping out buildings,
sidewalk and non-essential foreground areas, but can be
troublesome especially when having to manually crop each image.
One possible approach to this problem is to use Canny edge
detection [16] to calculate the outlines of objects within the
background image and then use a modified Hough transformation
[17] to detect the extended road lines. In each of the averaged
images, the longest lines in the image are likely to correspond to
outlines of the main road extending through the image. One can
then calculate the area within the longest lines to estimate the road
region. Classification of traffic density is done using the found
foreground traffic area together with the calculated region of
interest road area.
(a) (b) (c) (d)
Figure 5. (a) Background image. (b) Results of the Canny edge
detection done on the image (a). (c) The lines calculated from
the thresholded Hough transformation. (d) The road area in
red calculated by extending the lines in (c) and filling in the
area within those boundary lines.
Figure 6. Calculated road areas for the other three
intersections. Although the areas do not capture the perfect
road area, they offer a reasonably good approximation of
region of interest which can be used to filter out noise in non-
road portions of the images.
Using Canny edge detection on the background image (Figure
5(a)), we get the outlines of objects in the image (Figure 5(b)).
After finding the edges in an image, applying a thresholded
Hough transformation captures the long lines in the image which
signify the road lines (Figure 5(c)). The Hough transform keeps
track of any line with the equation y = mx + b for slope m and y-
intercept b. Any line can be uniquely identified by the pair of
slope and y intercept (m, b). In the implementation, the polar
coordinate representation of a line is used to construct the
parameter space due to the problems with (m, b) representation –
unbounded parameter domain and infinite m for vertical lines. At
each pixel the algorithm determines if there is enough evidence of
an edge or a line segment that connects through that pixel. Each
unique line contains a particular evaluation value in the algorithm
and more evidence of specific line parameters will increase the
evaluation value of that line. The lines with the largest value are
then chosen as the most visible lines in the image. Since road
lengths span a large portion of the image, a high enough threshold
will isolate those large lines. If we extend the lines and fill in the
area, we find the road region in the image (Figure 5(d)). Figure 6
shows the extracted road areas for the other three locations.
2.4 Classification The easiest way to classify the traffic density would be to count
the number of foreground pixels located in the road area.
Foreground traffic would only be counted if it is within a buffer
region of the calculated potential road area. Since different traffic
camera views have different sized road areas, traffic data can be
extracted based on the ratio of the calculated foreground area to
the potential road area of that camera’s image. This ratio can be
broken down into qualitative classifications of traffic. For
example, a 0.025 ratio for a certain intersection might mean light
traffic while 0.5 might indicate heavy traffic. Since the
classification of images between light, medium, heavy, etc.
remains arbitrary, specifying exact cut off values for a
classification would best be done after monitoring a system over
time. This way one could make better judgments as to the
accuracy of such measurements as judged by comparison to
human interpretations of the images. Since assignment of the
classification is a judgment call and may differ based on the
intersection and the camera, no defined classification has been
reached in this work at the moment. Such a classification can be
done through experimental analysis and can be compared to
human interpretation for evaluative purposes.
3. RESULTS The estimation system was implemented in C++ with OpenCV
library. 200 images were gathered for the classification process
with 50 images from each of the four different camera locations in
the island of Oahu. 30 of those 50 images were chosen at random
and used to train the codebook and moving average models. The
other 20 images from each site were evaluated for traffic content.
The output for each tested image is a number corresponding to
calculated traffic ratio in that image.
Due to the limited space, results for four test images for each site
are shown in Figure 7. The numbers below each image indicate
the traffic density percentage calculated using moving average
and codebook methods. MA represents the traffic density
calculated from the moving average method of identifying
foreground traffic and CB represents the codebook methods of
identifying foreground traffic. The four test sites show
significantly different results in terms of how they were classified
using the two different methods.
3.1 University Ave. and Dole St. Codebook and moving average had less variation between them at
University and Dole relative to the variation between the two
methods at the other three sites (first column in Figure 7). They
both performed similarly for this particular intersection, with the
greatest issue being the perspective of the camera. Both methods
appeared to underestimate the traffic in images with many small
cars in the distance and overestimate the traffic in images with
vehicles closer to the camera. The two way traffic presented
another challenge as traffic in the outgoing lane (with vehicles
oriented toward the upper right of the image) would occupy
greater area than the incoming lane (vehicles oriented toward the
lower left of the image) due to the position of the camera. This
particular intersection had limited issues in lighting which would
prove to be a significant issue in the analysis of the other
intersections. Overall, this intersection is easy/medium to classify
relative to the other intersections.
MA: 9.68 36.96 48.99 4.22 CB: 14.80 23.14 7.04 1.71
MA: 12.63 36.60 37.10 4.31
CB: 13.73 39.94 23.44 17.08
MA: 4.19 41.74 41.66 4.09
CB: 1.12 15.40 44.11 22.83
MA: 6.04 57.06 21.52 12.63
CB: 7.93 4.05 6.59 15.75
Figure 7. Traffic density quantified as percentage of traffic
within the extracted road area for the four intersections.
(From left column to the right column: University avenue and
Dole street, King and Punchbowl streets, King and Bishop
streets, Beretania and Punchbowl streets.)
3.2 King St. and Punchbowl St. For the set of images used in this work, King and Punchbowl
appeared to be a busier intersection than the other three (second
column in Figure 7). Codebook did a much better job of correctly
identifying low traffic values for the images with few vehicles in
them. The problem with moving average came from the
differences in lighting between the images. Although shadows
were minimal, the effect of changes in lighting over time meant
that the road appeared shiny and light in some images while
darker in others. The estimation process using moving average
was picking up significant portions of the road as foreground
since it is different in color from the background image.
Codebook, on the other hand, could adapt to the lighting changes.
While the camera on King and Punchbowl had a similar
perspective view to the one on University and Dole, the one way
traffic made computing the density easier. Traffic for this
intersection is easy to classify because of its closer proximity to
the road, minimal shadow effects, and one way traffic flow.
3.3 King St. and Bishop St. King and Bishop had significant lighting changes within the span
of the twenty test images which threw off some results for the two
methods (third column in Figure 7). Moving average based
estimation had high calculated traffic much like that of King and
Punchbowl due to the variability in lighting and the dramatic
shadow effects in the downtown environment. Codebook did a
decent job adapting to the lighting issues except for three images
in the test set with values around 44, 65, and 57. (Only one of the
three images is shown in Figure 7.) These three images had very
little traffic in them and the estimation problem most likely
stemmed from the unique shadow conditions in those three
images. Unfortunately, the training images did not have that
particular shadow pattern as the large shadow appeared to have
entered over the course of a few minutes. One of the problems
with the low temporal resolution is that dramatic changes in
shadows can occur over the course of a few frames. Other than
those three images, codebook did a good job of detecting limited
traffic and correctly identifying higher levels of traffic. This
particular intersection is difficult to classify because of the
dramatic shifts in lighting and shadows most likely due to the
surrounding buildings.
3.4 Beretania St. and Punchbowl St. Beretania and Punchbowl had smaller traffic density calculations
and overall it appeared to have much less traffic in the images
(fourth column in Figure 7). Both moving average and codebook
methods seemed to have trouble picking up the small cars in the
background. The intersection is at a four way stop and the camera
perspective is pulled back from road more than the other
intersections. Moving average based estimation had consistently
low values except for some where the shadow of a tree took up a
large section of the road area. It did not do a good job estimating
traffic for this intersection. Codebook based estimation appeared
to identify the few images with heavier traffic. The vast majority
of vehicles in the image appeared clumped together in the
background, most likely due to the traffic light causing them to
stop. Overall this intersection and intersections of this type are
probably the most difficult to classify. The four way intersection
requires a camera to pull back in order to capture traffic coming
from all four directions. This makes the resolution even worse.
Since traffic was mostly in the background, any traffic moving
from right to left or left to right at that intersection would appear
more prominently. This was the case in a few test images. Shadow
effects are also a concern as a large tree shadow progressively
emerged onto the road area through successive images.
4. DISCUSSION
4.1 Problems with Moving Average The first classification attempt using moving average as a
background model performed poorly for images with high levels
of contrast from shadows and lighting. Moving average resulted in
a neutral background image so shadow or light details in an
original image are identified as foreground traffic after
background subtraction. Certain intersections with high levels of
trees and buildings in the surrounding area, such as the one in
King and Bishop streets, give off patterned light areas at certain
times of day (Figure 8).
Implementation of codebook helped to correct for lighting
changes since the training images captured over the course of the
day had enough shadow and lighting effects for codebook to adapt
to the different background conditions (Figure 9). There is a large
difference between codebook and moving average for several of
the test images from King and Bishop streets, and King and
Punchbowl streets. Moving average tends to overestimate
foreground traffic by counting different lighting patterns as traffic.
(a) (b) (c) (d)
Figure 8. (a) This traffic image from King and Bishop streets
shows interesting shadow and lighting effects. (b) The moving
average background image taken over the course of one day
creates a neutral image. (c) Background subtraction and
thresholded image reveals several lighted areas are perceived
as foreground. (d) The calculated foreground image is
superimposed over original image (a).
In general foreground extraction with codebook resulted in
respectable outcome for relative traffic. The process of evaluating
methods is somewhat imperfect since there is no best measure for
a qualitative process of evaluating traffic density. However,
codebook has discernable problems identifying background on a
larger level although some pixels around the larger areas were
added. These were not significant enough to warrant concern
since traffic density analysis does not require such precision.
Figure 9. The codebook result of the same street image as in
Figure 8(a) is shown in the middle. It has a faded
identification of lighting which, after thresholding and post-
processing, results in the image on the right. Such a
classification would be successful since there are no cars in the
image and there should not be any white foreground pixels.
4.2 Perspective View of Camera The camera position differs for different intersections and rarely is
head on with the traffic. This results in the skewing of perspective
angles. For example, the University and Dole traffic camera
perspective makes cars in the lower right-hand area of the image
larger while causing cars in the incoming lane to appear smaller
(see the first image in Figure 7). As with most images, the
vehicles further away from the camera take up less space than
those closer to the camera which can skew results for images with
many cars in the distance or a few large cars up close. Some cars
in the distant background are treated as noise if their size is not
large enough to register for the connected component threshold.
4.3 Different Sites The different intersections provided different results for the
different models. On the University Ave. and Dole St. intersection
the codebook and moving average performed similarly on the
analysis with the weakness that both tend to disregard cars in the
distant background. The images from the intersections of
University Ave. and Dole St., and Beretania and Punchbowl
streets tend to have a smaller proportion of traffic relative to those
of the other two intersections. Due to the camera angle, individual
cars tend to be smaller in those intersections as opposed to King
and Bishop streets, and King and Punchbowl streets.
From the test images, King and Punchbowl streets, and King and
Bishop streets have the most variance in lighting and shadows
between images. This caused the largest difference between
moving average and codebook calculations within the same
image. King and Punchbowl streets appear to have more traffic in
the test images as compared to the other intersections.
University Ave. and Dole St. provide the additional challenge of
two way traffic, while Beretania and Punchbowl streets have even
more complications as a four way stop. There is no way for the
current model to take into account heavy traffic in one direction
but not the other since it only measures total traffic throughout the
entire image. One possibility is to have multiple traffic models,
one for each lane and direction. However, that runs into the
additional challenge of identifying the car direction which is
difficult for a single image or a sequence of images with a large
between-image interval.
5. CONCLUSIONS AND FUTURE WORK The method studied in this paper provides a simple and efficient
approach to traffic density estimation when faced with the
limitations of low spatial resolution and large intervals between
successive images. The 3 minute interval rate between images
makes tracking vehicles virtually impossible and negates the
usage of sophisticated tracking algorithms. Low spatial resolution
makes it difficult to identify or find the exact outline of a vehicle
since the image is highly pixilated.
Moving average as a background model captured traffic well
under ideal lighting conditions but codebook worked better for
images with heavy shadow and lighting issues. A modified Hough
transformation did a decent job of identifying the road lines in an
image which were used to identify the road area. The simple
traffic density estimation method had trouble dealing with camera
perspective and had differing levels of accuracy for each of the
four tested traffic sites due to unique challenges of each
intersection. This work suggests an initial procedure for
classification of traffic density in real time which can be built
upon by implementing more advanced techniques in the future.
Even though it is simplified to deal with technological constraints,
this traffic density estimation system does a sufficient job of
analyzing traffic patterns around the island of Oahu.
There are a number of ways to improve current work so that it can
deal with some of the difficulties previously discussed and more
complicated situations. For example, a transformation of the street
image changing the viewpoint to a top down view could help with
the perspective issue. This makes vehicles in the distance take up
roughly equal area compared to those similar sized vehicles closer
to the camera.
Another problem to consider is that cars of similar color to the
road are sometimes removed from the image. Future
implementations may consider a separate color component, such
as hue, to avoid the problem. Better filtering methods could also
be used such that the threshold can be altered to recognize more
subtle differences between vehicle features and the roads.
One more future goal could be to implement a capability of
recommending a best path between locations that avoids heavy
traffic as detected by the cameras. One can expand on the results
of four different locations to incorporate all 90+ traffic locations
where the Honolulu Traffic Cameras are installed. Individual
traffic images from a location do not provide much information,
but as a network they can work together to recommend a path of
least traffic congestion from a starting point to a destination.
6. REFERENCES [1] Schrank, D., Eisele, B., and Lomax, T. 2012. TTI’s 2012
Urban Mobility Report. Texas A&M Transportation Institute.
[2] Sen, R., Maurya, A., Raman, B., Mehta, R., Kalyanaraman,
R., Roy, S., and Siriah, P. 2012. Kyun queue: A sensor
network system to monitor road traffic queues. In
Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems (November, 2012), 127-140.
[3] Wang, C., Thorpe, C., and Suppe, A. 2003. Ladar-based
detection and tracking of moving objects from a ground
vehicle at high speeds. In Proceedings of IEEE Intelligent Vehicles Symposium (June 2003), 416-421.
[4] Barbagli, B., Manes G., Facchini, R., and Manes, A. 2012.
Acoustic sensor network for vehicle traffic monitoring. In
Proceedings of the 1st International Conference on Advances
in Vehicular Systems, Technologies and Applications (Venice, Italy, June 24-29, 2012), 1-6.
[5] Sun, Z., Bebis, G., and Miller, R. 2006. On-road vehicle
detection: A review. IEEE Trans. on Pattern Analysis and Machine Intelligence 28, 5 (May 2005), 694-711.
[6] Ridder, C., Munkelt, O., and Kirchner, H. 1995. Adaptive
background estimation and foreground detection using
Kalman-filtering. In Proceedings of International Conference on Recent Advances in Mechatronics, 193-199.
[7] Koller, D., Weber, J., Huang, T., Malik, J., Ogasawara, G.,
Rao, B., and Russel, S. 1994. Towards robust automatic
traffic scene analysis in real-time. In Proceedings of the
International Conference on Pattern Recognition (Israel, November 1994), 126-131.
[8] Wren, C. R., Azarbayejani, A., Darrell, T., and Pentland,
A.P. 1997. Pfinder: real-time tracking of the human body.
IEEE Trans. on Pattern Analysis and Machine Intelligence
29, 7 (July 1997), 780-785.
[9] Friedman, N. and Russell, S. 1997. Image segmentation in
video sequences: A probabilistic approach. In Proceedings of
the 13th Conference on Uncertainty in Artificial Intelligence
(August 1-3, 1997). UAI ’97. Morgan Kaufmann, San Francisco, CA, 175-181.
[10] Stauffer, C. and Grimson, W. 1999. Adaptive background
picture models for real time tracking. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (Fort Collins, CO., June 1999), 246-252.
[11] Li, Z., Tan, E., Chen, J., and Wassantachat, T. 2008. On
traffic density estimation with a boosted svm classifier
Digital Image Computing: Techniques and Applications,
117-123.
[12] http://www1.honolulu.gov/cameras/traffic.htm
[13] Bradski, G. and Kaehler, A. 2008. Learning OpenCV.
O’Reilly Media, Inc., Sebastopol, CA.
[14] Kim, K., Chalidabhongse, T. H., Harwood, D., and Davis, L.
2005. Real-time foreground-background segmentation using
codebook model. Real Time Imaging 11, 3 (June 2005), 172-185.
[15] Gonzalez, R. C. and Woods, R. 2007. Digital Image
Processing. Prentice-Hall, Inc., Upper Saddle River, NJ.
[16] Canny, J. 1986. A computational approach to edge detection.
IEEE Trans. on Pattern Analysis and Machine Intelligence 8, 6 (November 1986), 679-698.
[17] Kim, Y. and Lyu, S. 1989. Extracting lines using a modified
Hough transformation. Multidimensional Signal Processing Workshop (Pacific Grove, CA., September 6-8, 1989).